AD-A185  818 


file  copy 


REPORT  DOCUMENTATION  PAGE 


*  Oi  Approved 

MU  VO  y'04<VB$ 


1 

f 

$ 


mon  TiB  St  ar.A^,  ;.*/  ‘  s  afpca’  \  M8f»^ 

AFObR  Tk  o  i  .5  28 


•  •  .  -  vs. 


•  tiff  .  -1  :  p 


■ 


r.f>  Sr^rp  .#'kJ  /»F  r^de) 


.....  vr 

r  *pp"<  Afvr. 


0#rr  ♦  W  .*  P  '  «tr 


'>*•1  <ti  '•  #r- 


}  :■■  i.  ■  , 

i  « i  V  i  \  '  1 

,  *  (  r. 

.  v  V*F  « 

1  ' 

l  ,  v  -’ 

i:> 

V»  <iV 

Pi  f  -)f  i  ^ 

\  MBf«S 

P»i  ><;RAV< 
t  t  Mi  V  fc 


AO«P  .,NiT 
AM  kSS-ON  NO 


-4  DAV  lit  Rff>OPr  y*j,  Month  Ds>)  r  5  pAGf  COUNT 


fi  8  f-  ’  ”?  ft  VS  Conbnue  on  reverie  d  op(PUi/>  And  »denf/fy  by  btoc*  number) 


r'v;r  ».r  ir  *  ♦-  #iu»\  #rnf  .^rr.fPy  b>  b/:x*  lumber) 


DTIC 

tffcELECTEi 

%  NUV  0  9  W  r 

w  k 
t  E 


00  fo-^  un  il/*  M 


s  .  .'■  •-  .•-  .N  .N  ■-  -. 


AFOSR-TR.  a  7  -  1  5  28 


*. 

V*-1 


STOCHASTIC  APPROXIMATION  AND 
LARGE  DEVIATIONS:  GENERAL  RESULTS 
FOR  w.p.l.  CONVERGENCE 


4 


Paul  Dupuis  and  Harold  J.  Kushncr 
February  1987  LCDS/CCS  #87-21 


- 


^£gCI  SP  ER 

2w<» Sfe/ir  C'V' 


V.  <■  «V  ..  -.-J 


Wi  •>  nV-  \  •  / 

J  v  J  ">»»•'<  tJjVVt  J#— '•  •_  -V  • 

-  —  •  ~  ‘-T 


,‘x  ^*.*<*hJ 

'^W  - 

•  *  <r  •'?  — 

-■ . • 

■-  -«■•*-  ' C  i-hs^r 

•'  ;  r*!  !**■•■? 

•  •'  -  -•»■- 
••• 


-r.y qw-v-t*jj-T>e/r 


STOCHASTIC  APPROXIMATION  AND 
LARGE  DEVIATIONS:  GENERAL  RESULTS 
FOR  w.p.I.  CONVERGENCE 


by 

Paul  Dupuis  and  Harold  J.  Kushner 
February  1987  LCDS/CCS  #87-21 


STOCHASTIC  APPROXIMATION  AND  LARGE  DEVIATIONS: 


GENERAL  RESULTS  FOR  w.p.I.  CONVERGENCE 


by 

Paul  Dupuis4 
and 

Harold  J  Kushner44 


Lefschetz  Center  for  Dynamical  Systems 
Division  of  Applied  Mathematics 
Brown  University 
Providence,  Rhode  Island  02912 


February  1987 


^Research  supported  in  part  by  contracts  ONR  N-00014-83-K-0542, 
ARO-DA AG29-84-K-0082,  and  NSF  DMS-851J470. 


++Research  supported  in  part  by  contract  AFOSR-85-0315,  NSF 
ECS-8505674,  and  ARO  GC-A-662998. 


ABSTRACT 


W.p  )  convergence  results  arc  obtained  for  stochastic  recursive 
approximation  algorithms  under  very  general  conditions.  The  gam 
sequence  (an)  can  go  to  zero  very  slowly  and  state-dependent  noise, 
discontinuous  dynamical  equations  and  the  projected  or  constrained 
algorithm  arc  all  treated.  The  basic  technique  is  the  tncorv  of  large 
deviations  Prior  results  obtained  via  this  theory  arc  extended  in  manv 
directions  Let  x  «  b(x)  denote  the  ‘mean’  equation  for  the  algorithm,  let 
6  >  0  be  given,  and  let  G(0)  be  a  neighborhood  of  a  stable  point  6  of 
that  ODE  Then,  asymptotic  upper  bounds  to  aNlog  P{Xn  €  G(0).  n  *  N| 
1XN— ©1  (  6}  arc  obtained  These  are  often  more  informative  than  the  usual 
classical  rate  of  convergence  results  (which  use  a  ‘local  linearization’)  and, 
furthermore,  are  obtained  for  the  constrained  and  non-smooth  eases,  for 
which  there  are  no  ‘rate  of  convergence’  results. 


Kev  Words:  Stochastic  approximation,  large  deviations,  recursive  algorithms, 
-errors  for  tracking  systems  > 

AMS  #  60F10,  62L20,  93E10,  93E12 


I.  INTRODUCTION 


We  obtain  w.p.l  convergence  results  as  well  as  useful  (non-classical) 
estimates  of  'rate  of  convergence'  for  fairh  general  stochastic  approximation 
(SA)  processes  of  the  form  (I  I),  s  ia  the  thcorv  of  large  deviations  (Rr  = 
ru:lidcan  r-spacc) 

(l.l)  Xn  ,«  X  ♦  a  b  (X  .1  ),  Xn  €  Rr.  0  <  a  -  0,  Ea  -  ®  . 

n-*l  n  n  n  n  n  n  n  n 

We  also  treat  the  projection  algorithm  (1.2).  where  t?G  denotes  the  nearest 
point  of  a  compact  convex  set  G 

(l:<  Xn~t  =  nG(Xn  +  anbn<Xn-V> 

Such  algorithms  have  been  the  subject  of  considerable  attention  [1]  -  [4],  under 
a  great  variety  of  conditions.  They  appear  in  various  guises  in  many  places 
in  control  and  communication  theory. 

In  (1.1),  the  (5n)  is  a  random  process,  which  might  be  state  dependent 
itself  and  which  takes  values  in  a  compact  metric  space  M.  The  bn  might 
simply  be  a  function  of  Xn,  4n.  More  generally,  we  allow  (bn)  to  be  a 
sequence  of  vector  valued  (Rr)  mutually  independent,  but  not  necessarily 
stationary  random  fields  parametrized  by  Xn,  ln.  In  this  case  bn  is 
characterized  by  the  distribution  function  (which  will  depend  on  n  in  the 
non-stationary  case) 

(1.3)  P{bn  €  B  |  X.,l.,bi.1,  i  <  n)  -  P{bn  €  B  |  X„,(n). 

We  suppose  that  |bj  <  K  <  •  for  some  constant  K.  There  are  many 
applications  where  the  random  field  notation  is  useful  since  it  is  awkward  or 
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difficult  to  express  explicitly  all  the  random  variables  which  might  be 
involved  (e.g.,  the  Xn,  tn  might  determine  other  random  variables  which  arc 
used,  in  turn,  to  calculate  Xn+1  from  Xn).  For  example,  consider  an  adaptive 
routing  problem,  where  Xn  denotes  the  routing  parameter  and  5n  the  (vector) 
buffer  occupancies  at  time  n.  Then  bn  might  be  a  random  variable  which 
depends  on  ‘arrivals’,  ‘completed  services’,  ‘acceptances  of  arrivals’,  etc.  at  time 
n,  and  each  of  these  might  be  related  to  Xn,  {n  only  statistically  -  but  the 
exact  relation  is  either  too  complicated  to  write  (perhaps  involving  a  sum  of 
indicator  functions  of  various  possible  events)  or  not  necessary  to  write. 

If  bn  is  simply  a  function  of  Xn,  {n,  (b(Xn,tn)),  then  we  call  it  a 
deterministic  random  field.  Even  in  this  case,  the  £  might  be  state  dependent, 
correlated,  or  b(  )  might  be  discontinuous.  If  (bn)  is  a  deterministic  random 
field,  we  write  it  simply  as  b(Xn,ln).  Of  course,  since  {f.n)  is  a  random 
sequence,  (b(Xn,{n)}  is  not  deterministic,  in  the  usual  sense. 

Perhaps  the  weak  convergence  based  methods  [3],  [5],  [6]  are  the  most 
powerful  general  methods  for  dealing  with  the  asymptotic  properties  of  (1.1) 
or  (1.2).  The  conditions  for  the  validity  of  such  methods  are  often  readily 
verifiable.  One  common  approach  is  to  derive  an  ODE  (ordinary  differential 
equation)  for  the  ‘mean’  dynamics  x  -  b(x)  «  Eb(x,0  (where  this  is  well 
defined)  and  to  show  that  the  asymptotic  path  of  {Xn}  is  arbitrarily  close  to 
that  of  the  asymptotic  solutions  to  x  -  b(x)  in  the  sense  of  the  weak 
convergence  theory.  Typically,  under  some  stability  property  of  the  ODE,  this 
method  locates  the  points  (or  point)  near  which  {Xn}  spends  ‘nearly  all  of  its 
time’.  Nevertheless,  there  is  still  considerable  interest  in  actual  w.p.l. 
convergence.  A  powerful  method  would  use  a  weak  convergence  approach  to 


find  the  ‘asymptotic’  points  or  sets,  and  then  use  a  ‘local’  method  to  show 
w.p.l  convergence  of  {XJ  to  an  appropriate  stable  point  of  the  ODE,  under 
the  usual  condition  that  some  compact  set  in  its  domain  of  attraction  is 
entered  infinitely  often  (which  would  itself  often  be  shown  by  a  weak 
convergence  based  method). 

Among  methods  that  can  be  used  to  prove  w.p.l  convergence,  those  based 
on  the  theory  of  large  deviations  have  a  number  of  advantages.  They  can 
handle  a  more  general  (and  much  more  ‘slowly  converging’)  gain  sequence  {an} 
then  the  classical  methods.  (They  can  have  difficulty  with  problems  where 
the  qth  moments  of  the  $n  or  bn(x,£n)  grow  too  fast  as  q  -  ®  (say,  faster  than 
those  for  bn  =  Gaussian),  but  this  rarely  seems  to  be  a  serious  problem  in 
applications.)  Due  to  recent  advances  in  the  theory  of  large  deviations,  we  can 
now  also  treat  problems  with  state  dependent  noise  and  discontinuous 
dynamics  as  well  as  constrained  problems.  These  facts  imply  the  availability 
of  a  rather  powerful  technique  for  getting  w.p.l.  convergence.  The  state 
dependent  noise  is  more  general  than  allowed  in  [3],  [7].  The  mathematical 
development  here  seems  to  more  complicated  than  the  powerful  ‘martingale’ 
based  methods  of  [4],  [8],  However,  we  can  handle  more  slowly  (and 
erratically)  converging  gains,  the  constrained  case,  a  different  class  of  state 
dependent  noise  cases,  the  random  field  model,  and  get  a  very  informative 
estimate  of  the  rate  of  convergence  even  when  the  classical  ‘local’  smoothness 
conditions  are  violated.  This  latter  point  is  particularly  important. 

Typically,  the  large  deviations  estimates  involve  both  an  upper  and  a 
lower  bound  for  a  (suitably  normalized)  probability  of  a  ‘rare’  event  (say  the 
event  that  the  stochastic  approximation  (asvmptotically)  escapes  from  a  small 


neighborhood  of  a  stable  point  of  x  =  b(x)).  To  get  the  w.p.l  convergence 
here,  only  an  upper  bound  is  needed,  and  this  allows  a  result  under  weaker 
conditions  than  would  be  required  if  both  bounds  were  desired.  The  upper 
bound  serves  as  a  useful  indicator  of  the  rate  of  convergence,  perhaps  even 
more  useful  than  that  obtained  by  the  classical  methods.  It  is  often  obtainable 
even  for  non-stationary  problems,  in  contrast  to  the  classical  ‘rate’  results. 

The  ‘rate’  calculated  by  the  classical  methods  is  just  the  asymptotic 
u 

variance  of  (Xn  -  ©)/a  ,  where  0  is  the  limit  point.  Its  derivation  requires 

a  certain  ‘regularity’  in  the  way  an  -*  0,  and  a  local  expansion  of  the  dynamics 
about  0.  Assuming  appropriate  smoothness  (usually  twice  differentiability  of 
b(x,0  at  x  =  0,  which  is  not  needed  by  the  large  deviations  method)  of  b  for 
x  near  0,  the  classical  rate  depends  only  on  the  gradient  of  Eb(x,t)  for  x  =  0 
and  on  the  statistics  of  (b(0,(in)}.  In  many  applications,  one  is  more 
interested  in  an  (suitably  normalized)  estimate  of  the  probability  that  the  path 
(Xn,  “  >  n  i  N)  will  escape  from  some  given  neighborhood  of  0  for  large  N. 
This  would  involve  the  full  stabilizing  effect  of  the  dynamics  and 
‘destabilizing’  effect  of  the  noise  in  that  interval,  and  such  a  useful  estimate 
is  obtainable  from  our  results.  Also,  the  likely  escape  routes  are  also  of 
interest,  and  are  obtainable  as  the  minimizers  in  (1.4)  below. 

Our  rate  estimate  takes  the  following  form.  Let  D  denote  a  compact  set 
in  the  domain  of  attraction  of  a  stable  point  0  of  the  ODE  and  with  0  €  D°, 
the  interior  of  D.  Let  6  >  0  be  given.  Let  AD(T)  denote  the  set  of 
continuous  functions  $(■)  with  |#0)  —  0|  <  6  and  #t)  £  D  for  some  t  <  T. 
We  will  exhibit  a  function  L($,$,t)  >  0  which  is  zero  iff  <t>  =  b($)  and  a 


function  S(T,tf>): 
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S(T,« 


L(&s),<fts),s)ds 


(for  <t>  absolutely  continuous) 
(otherwise) 


such  that 


(1.4) 


lim  a  log  P{X  £  D,  some  m  *  n  I |X  -9|  <  S)  «  -  inf  S(T,$)  <  0. 

n-®  n  m  [  n  0€Ad(T) 

T>0 


The  right  hand  side  of  (1.4)  can  yield  estimates  that  are  very  useful  for  a 
‘rate’  of  convergence,  and  for  the  dependence  of  this  rate  on  the  behavior  of 
the  algorithm  in  the  set  of  interest  D,  as  well  as  for  the  comparison  of 
algorithms. 

In  [9],  [10],  [11],  sharp  upper  and  lower  bounds  were  obtained  for  SA 
algorithms  by  the  methods  of  large  deviations  theory,  and  a  great  deal  of 
useful  information  was  presented  concerning  the  bounds  and  the  structure  of 
the  H  and  L-functionals.  These  references  required  an  -  0  in  special  ways, 
the  noise  was  ‘exogenous’,  and  the  dynamical  term  b  was  a  smooth  function  of 
x.  The  methods  were  unable  to  handle  the  constrained  problems.  Strictly 
speaking,  the  results  in  these  references  were  not  w.p.l  convergence  results. 
They  dealt  with  the  sequences  of  sequences  {XJ^,  m  *  0),  n  =  1,2,  ...  ,  defined 
b>'  Xm+i  -  Xm  *  an+mb(Xm'<n+J’  XS  “  x  Although  the  analysis  of  such 
processes  is  basic  to  the  convergence  result,  we  deal  here  with  the  actual 
process  itself.  Also,  since  we  are  concerned  with  upper  (large  deviations) 
bounds  only,  we  use  1  i  m  to  define  the  various  functionals,  rather  than  lim  as 
illustrated  in  the  sequel.  This  allows  a  result  under  weaker  conditions  on 
the  [a  },  b  ,  as  will  be  seen  below. 

*  n  n  n 
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Thc  basic  assumptions  are  stated  in  Section  2,  and  examples  given  to 
illustrate  some  of  them.  The  properties  of  the  'upper  bound’  which  we  use 
instead  of  the  usual  log  of  the  exponential  moment  (the  H-functional)  arc 
discussed.  Some  of  the  conditions  ((A2.3)  and  ( A 2.6 ) )  are  stated  in  a 
fairly  general  form,  since  they  allow  a  simple  proof  of  the  main 
convergence  result.  Theorem  3.1,  not  cluttered  with  all  the  details  required 
for  all  the  special  cases.  It  also  facilitates  the  application  of  future 
results  in  large  deviations  theory  to  the  stochastic  approximation  problem. 
In  the  sequel,  we  give  considerable  detail  on  verifiable  sufficient 

conditions  for  these  assumptions.  (A2.3)  is  a  standard  assumption  in  large 
deviations  theory  (sec  also  the  remarks  concerning  it  in  Sections  2  and  6). 
and  it  seems  to  be  satisfied  in  all  the  examples  of  interest.  Assumption 
( A2.6)  is  of  a  ‘large  deviations’  type  itself,  and  the  bulk  of  the  paper  is 

actually  devoted  to  sufficient  conditions  for  it  in  ‘non-smooth  cases’ 

(Section  7),  constrained  and  state-dependent  noise  (Section  5),  smooth 

dynamics  and  exogeneous  noise  cases  (Section  4). 

The  total  picture  is  a  w.p.l  convergence  result  with  the  associated  ‘escape’ 
probability  estimates  under  quite  general  conditions. 


2.  BACKGROUND  AND  BASIC  ASSUMPTIONS 


In  this  section.  introduce  some  rather  general  assumptions  which  will 
be  used  to  prose  the  main  convergence  theorem  in  Section  3.  Two  of  the 
assumptions  (i.-\2  3i  and  i  A  2  6 1 )  arc  not  cavils  verifiable,  but  are  used  simplv 
to  facilitate  the  proofs  in  Section  3  We  prefer  to  work  with  these 
assumptions  in  this  section,  since  the  conditions  and  methods  which  guarantee 
them  differ  from  ease  to  ease  We  will  return  to  them  in  Sections  4  and  5. 
where  readilv  verifiable  sufficient  conditions  for  them  arc  given  for  a 
number  of  eases  that  cover  a  wide  varictv  of  applications. 

Until  Section  5.  we  work  onlv  with  (l.l).  the  unprojcctcd  ease.  We  say 
that  Un}  is  ‘cxogencous'  or  ‘non-state  dependent’  if  for  any  n  and  Borcl  set  A 
€  o{  i  >  ni.  we  have  P(A  |  i  i  n>  =  P(A  |  X(,  i  *  n}.  For  the 

‘state-dependent'  noise  ease,  we  use  the  model  where  the  pair  (Xn.  tnl)  is  a 
Markov  process.  This  covers  a  large  number  of  important  applications,  and 
provides  for  a  convenient  analysis.  For  the  state-dependent  case,  define  the 
one  step  transition  function  (2.1),  which  we  suppose  to  be  independent  of  n. 

(2.1)  P*(UA)  =  PUn  €  A  |  Xn  -  x,  ln_,  *  U- 

In  the  state-dependent  ease,  a  so-called  ‘fixed-x’  process  {{*}  appears  in  the 
analysis,  exactly  as  for  the  weak  convergence  approach  [5].  For  each  x  €  Rr, 
define  {t*}  as  the  M-valued  markov  process  whose  transition  function  is 
obtained  by  convolving  PX({,A). 

We  next  define  the  ‘large  deviations’  H-functional.  For  the  case  of 

»  o({,  i  <  n)  and  let  Ejr  denote  the  expectation 

1  ‘  n 


cxogencous  noise,  define  F 


conditioned  on  Fn.  For  the  state-dependent  noise  case,  let  f  ^  d 
expectation  given  =  (.  We  first  define  the  functionals  for  the  ca 
constant  gain  an  *  a  >  0,  and  then  make  the  alterations  which  are  req 
when  an  -  0.  The  following  assumptions  will  be  used  Sufficient  c.  n  i  ■ 
arc  given  in  the  remarks  following,  and  in  Appendix  I 

A2.1.  Exogeneous  noise.  The  lim  sup  exists  umfomu's  >n  u  anj  \ 
compact  set)* 

-  1  Nj-n 

(2.2a)  H(x.a)  =  Hm  -  log  Er  exp<a.  £  b(x.(  ) 

N,n  n  N  N  - 1  1 

State  dependent  noise  case.  The  lim  sup  exists  un,  m  (  e  M 

(in  any  compact  set) 

- -  1  N  +  n 

(2.2b)  H(x,a)  »  lim  -  log  EJ  exp<a.  l  b(x.{*..i 

N,n  n  '  N  » 1  1 

A2.2.  There  is  a  continuous  f unction  b(  I  such  that  <e  xogencous  no;-, 
uniformly  in  x  in  any  compact  set  and  in  w)  as  n.N  -  • 

1  N  +  n  — 

(2.3a)  -  £  Ef  b,(x,{  )  -  b(x) 

n  n+i  r b  1  1 

( state  dependent  noise,  and  uniformly  in  x  in  an\  compact  set  and  in  (  €  VI  < 

(2.3b)  -  N£"  E ?b (x,{* N)  -  b(x) 

n  N-t-i  '  1 

*We  say  that  lim  exists  uniformly  in  w  if  for  an>  6  >  0  there  arc  Ng.  ng  s 
that  for  n  >  ng  and  N  >  Ng,  the  r.hs  of  (2  2a)  is  <  Htx.a)  ♦  6  »p  I 


K  c  m  a  r  k  Nin^e  the  h  arc  hounded.  3  condition  equivalent  to  1  2  2a  1  iwith  a 
timilji  change  >2  2 >  >  is  to  use  in  lieu  of  Ej-  ,  where  m  »  otm 

f.  -m  n 

We  use  the  following  assumption  on  Hi  ,  1  and  comment  or  it  below  and 

;n  Append.*  I 

L)c!;nc  the  dual  f  unction  or  Legendre  transform 

li V  S'  *  sup  [  a.8  -  Htx.oj] 

a 

A  2  '  ■  a  1  >  i'  lone  in  hmh  \anahU ■'  ( b >  Ftn  each 

<■ .  c  ■ 1  >■  |l  o  a>  1  u.’i  ii;  a  »  0 

a 

Inc  a-dit  1  crcnnahilitv  is  a  rather  weak  requirement  Some  sufficient 
»■  ndito  ns  ate  gocn  in  the  remark  below  A  more  general  approach  appears 
m  Appendix  1  (Section  61  As  discussed  in  Section  6.  it  is  equivalent  to  the 
-onditi.m  that  (is.  3i  «  0  iff  8  «  b<  x  I,  the  mean  value  of  the  dynamics,  and 
seem-  t  be  satisfied  in  all  examples  of  interest 

Remark  on  the  I  s.c  of  L f  .  )  jjl  ( A 2  3 )  The  I  s  c  property  holds  if 
M  is  continuous  Although  conditions  guaranteeing  this  continuity  may 
'  a  r  v  from  v:asc  to  ease  it  is  often  quite  easy  to  prescribe  mild  sufficient 
conditions  fur  a  given  ease  For  example,  if  b(x,l)  -  b(x.O,  and  if  b(x,{)  is 
continuous  in  x  (uniformly  in  fl  then  H(x.at)  is  continuous  Even  if  b(  ,t)  is 
nut  continuous  it  is  often  true  that  the  noise  provides  enough  ‘smoothing’  so 


that  for  some  m  »  0  inot  depending  on  x,  w,  N  or  n)  the  functions 


are  continuous  in  (x,a)  uniformly  in  the  other  variables.  Under  a  mild 
additional  condition,  this  continuity  will  give  us  the  l.s.c.  property.  First,  we 
show  it  for  the  stationary  m-dcpendent  case,  where  for  any  j,  {^,  i  «  j},  i  > 
j  +  m)  are  mutually  independent. 

Define  Dq(x,a)  =  E  exp  E^ko^x,?^  and  Hq(x,ot)  =  (log  Dq(x,a))/(q+m). 
Suppose  that  Dq  is  continuous  for  each  q.  By  the  m-dependent  property  and 
the  stationarity, 

1  kq  +  km 

: - : - log  Er  exp  E  <ct,b:(x,L)> 

kq  +  km  -m  l 

<  : - ^ - log[expla(  Kkm]  ifl  Er  exp  1  <o,b(x,L)> 

kq  +  km  t=i  t  q-m  iq  1  1 

=  6q|aj  +  Hq(x,o). 

where  6q  «=  K/(q  +  m)  -  0  as  q  -  Thus  H(x,a)  <  Hq(x,a)  +  6qla). 

To  show  the  l.s.c.  property  of  L,  proceed  as  follows.  Let  Bi  -  6,  x;  -  x  and 
write 

ljin  L(X;,B.)  >  l|xn  s^  [<a,Bi>  -  Hq(Xj,a)  -  8q|aj] 

*  l<a>8>  ~  Hq(x-a)  “  6qN]- 

Now,  let  q  -  •  (so  that  Hq(x,a)  +  6q|aj  can  be  replaced  by  H(x,«)),  and  then  let 
M  -  “  to  get  by  monotonicity  that 

4m  L(xi,ei)  >  sup(<a,8>  -  H(x,a)]  -  L(x,B), 

i  a 

which  is  the  l.s.c.  result. 

It  is  also  simple  to  prove  the  o-differentiability  (A2.3b)  for  this 
m-dependent  and  stationary  model  (even  without  the  continuity  in  x).  At 


a  =  0,  the  gradient  of  Hq(x,oc)  equals  E  EJ  bj(x,?j)/(q  +  m),  which  converges  to 
a  limit  b(x)  as  q  "  Since  the  convex  (in  o)  function  H(x,a)  is  bounded 
above  by  the  convex  functions  Hq(x,a)  +  8q|oj,  and  since  H(x,0)  =  Hq(x,0)  =  0, 
the  set  of  subdifferentials  of  H(x,  )  at  a  =  0  is  contained  in  the  set  of 
subdifferentials  of  Hq(x,  )  +  6q|  ■  |  for  every  q.  This  latter  set  converges  to  the 
point  b(x)  as  q  -  Hence  H(x,  •)  has  b(x)  as  its  unique  subdifferential  at  a  = 
0,  which  implies  that  Ha(x,0)  exists  and  equals  b(x). 

A  proof  similar  to  that  above  can  be  employed  to  get  the  l.s.c.  of 
L(-,)  if  the  DN  (•,•)  are  continuous  for  some  d  >  0  (not  depending  on  N, 
n,  x,  u)  and  the  lim  in  (A2.1)  is  attained  in  the  following  uniform  way: 
Let  there  be  8(N0,N1,n0,n1)  (which  do  not  depend  on  x,  u  or  a)  which  goes 
to  zero  as  Nj  -  N0  -*  ",  nj-n0  n0  NQ  "  and  such  that 

|H(x,cc)  -  n  suj)^  2  lo8  DN,n(x>a)t  <  8(N0,N1,n0,n1)(|al  +  1). 

nj>n>n0 

This  condition  doesn’t  seem  particularly  restrictive. 


Remark  on  the  calculation  of  the  derivative  Ho(x,0)  in  (A2.3).  We 
show  how  to  calculate  the  value  of  the  derivative,  given  that  it  exists. 
The  derivative  plays  a  crucial  role  in  the  sequel,  since  it  defines  the  ‘mean 
dynamics’  for  the  algorithm  (1.1).  The  following  readily  verified  facts 
about  convex  functions  will  be  used  to  get  Ha(x,0)  in  terms  of  the 
statistics  of  (bn(x,(n)}  or  (bn(x,(*)}. 

(i)  Let  {f g(  - ))  be  convex  on  Rr  and  satisfy  f;(0)  «  0.  The  sup^a)  is 
differentiable  at  a  «  0  only  if  each  f;(  )  is  and  the  gradient  fio(0)  does  not 
depend  on  i. 


(ii)  Let  each  f.(  )  in  (i)  be  differentiable  at  a  =  0  and  let  f(a)  =  linvLfa) 
exist.  If  f(  )  is  differentiable  at  a  =  0,  then  fa(0)  =  limT^fO). 

Now  we  use  (i),  (ii)  and  the  limit  assumptions  (A2.1),  (A2.2)  and  the 
differentiability  assumption  in  (A2.3)  to  calculate  Ha(x,0).  By  definition 

lim  *  lim  sup 
N-®  N0-»®  N>N0 

n-*®  no"*®  m)n0 

This,  together  with  the  above  facts  and  assumptions  allows  us  to  calculate 
Ha(x.O)  as  follows.  Write  H(x,a)  in  the  form 

lim  -  log  Ej-  exp<o,  N^n  (b  (x,L)  -  Er  b  (x,L))>  +  -  N^n  <oc,Er  b.(x,L)> 

The  result  that  Ha(x,0)  *  b(x)  follows  by  noting  that  the  derivative  (at  o 
=  0)  of  the  terms  to  the  right  of  the  1/n  is  zero  for  all  n,  N  and  using 
(A2.2).  In  fact,  what  we  have  really  shown  is  that  (A2.1)  and  (A2.3)  imply 
the  existence  of  b(x)  such  that  (2.3a)  holds.  An  analogous  calculation 
works  for  the  state-dependent  noise  case. 


L  Consider  the  simplest  case,  where  b^x.O  ■  b(( x ).  Then,  under 


(2.3a), 


1  N  +  n 

b(x)  -  lim  -  l  Eb(x). 
n.N  n  N+l  1 

If  the  b((x)  are  identically  distributed  for  each  x,  then  b(x)  ■  Eb^x)  and 
H(x,a)  *  log  E  exp^b^x)),  and  lim  -  lim  in  (A2.1).  If  the  measure  induced 
by  b^x)  on  Rr  is  weakly  continuous  in  x  (as  is  rather  common  in  applications), 
then  H(-,  )  is  continuous.  The  rate  of  convergence  estimates  for  classical 
stochastic  approximation  (1)  -  (3),  (6]  do  not  cover  this  case  unless  b((  )  is  an 


E 


appropriately  smooth  function  of  x.  Thus,  even  in  this  simple  case,  which 
covers  many  applications  where  the  tr  involve  (e.g.)  indicator  functions,  we  can 
get  a  rate  estimate  unattainable  via  the  classical  theory. 


Example  2.  The  case  of  Example  1,  but  where  (b^x)}  are  not  identically 
distributed.  Define  FFCx.a)  =  log  E  exp <a,bi(x) >.  Then  b(  )  is  still  given  by 
(A2.2)  and 

- -  1  N+n 

H(x,a)  =  lim  -  l  H  (x,a), 

N.n  n  N  +  l  1 

which  exists  and  is  differentiable  at  a  =  0.  The  noise  process  here  is 
non-stationary,  but  we  can  still  get  our  ‘rate’  estimate.  The  example  also 
illustrates  that  the  use  of  1  i  m  rather  than  lim  in  (A2.I)  is  of  much  more  than 
academic  interest.  If  the  measures  of  the  b;(x)  are  weakly  continuous  in  x, 
uniformly  in  i,  then  b(-)  is  continuous. 

Example  3-  Remarks  on  the  use  of  lim  rather  than  lim  in  (A2.1).  The 
use  of  lim  is  somewhat  equivalent  to  taking  a  worst  case.  For  example,  let 
bn(x,{)  «  b(x)  +  where  { tn)  is  a  sequence  of  zero  mean  mutually  independent 
Gaussian  random  variables  with  covariances  (E  }.  Since 

'  n' 

1  N+n  1  N+n 

-  log  E  exp<a,  L  (b(x)  +  {.)>  -  b(x)  +  —  t  <a,E.a>, 
n  N+i  >  2n  N+i  1 

the  1  i  m  in  (A2.1)  is  just  b(x)  +  or'Ea/2,  where  E  is  the  1  i  m  of  -1?"  E  in  the 

n  N+i  1 

sense  of  non-negative  definite  matrices.  In  many  problems,  the  dynamics  are 
stable  enough  so  that  if  the  noise  terms  are  multiplied  by  some  factor  (to  take,  say, 
En  to  E)  we  still  have  the  required  ‘stability’  to  get  the  desired  w.p  1  convergence 
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Example  4.  Let  bj(x,0  be  simply  a  function  b(x,0  (i.e.,  a  deterministic 
random  field),  and  consider  the  case  of  Markov  state-dependent  noise,  with 
one-step  transition  function  Px($,)-  Under  a  uniform  (in  the  initial  condition) 
recurrence  condition  on  fixed  x-process  {(x}  and  continuity  of  b(x,0  in  x 
(uniformly  in  {)»  the  following  facts  are  proved  in  [24],  Let  C(M)  denote  the 
continuous  real  valued  functions  on  M  and  define  an  operator  mapping 
C(M)  -  C(M)  by 

(2.4)  P(x,o)(f)(0=  f  cxp<ct,b(x»>f(i/))Px(LdW. 

*  M 

A 

The  eigenvalue  V(x,a)  of  P(x,a)  with  the  maximum  modulus  is  real,  simple  and 
larger  than  unity  for  a  *  0.  Also  H(x,a)  =  log  Kx,a)  and  H(x,a)  is  analytic  in 
cl  If  the  right  side  of  (2.4)  is  continuous  in  x  for  each  f(  ■),  cl  £,  then  H(  ,  ) 
is  continuous.  Also  b(x)  ■  J  b(x,Ou*(d 5),  where  ux(  )  is  the  unique  invariant 
measure  of  (tx),  and  lim  -  lim  in  (A2.1). 

These  various  examples  can  be  combined  and  extended.  Other  examples 
are  in  Sections  5  and  6,  and  in  [II]  and  [24], 

The  Limit  QPE  and  Properties  ■of  the  L-Function.  In  the  expression 
(2.3),  defining  the  mean  ‘dynamics’  of  (1.1),  the  terms  are  weighted  equally. 
This  corresponds  to  the  case  an  =  a.  We  will  see  in  Section  3  that,  under  a 
simple  ‘asymptotic  continuity’  condition  on  {a n},  b(  )  also  yields  the 
appropriate  ‘mean’  dynamics  when  an  -  0.  In  order  to  get  any  sort  of  useful 
convergence  for  (XJ,  the  ODE 

(2.5)  x  -  b(x) 

must  have  at  least  one  stable  point  We  assume: 


BOTTOM 
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B 


V 


Define  tQ  =  0,  tn  =  E^a,  anc*  m(l)  =  max(n:  ln  4  0-  We  w'"  reQu>rc  the 
following  ‘asymptotic  continuity’  assumption  on  the  sequence  {an}. 

A2.5.  lim  ^-=1 

IVO"0  am 

'  n  m  m 


For  every  N  we  define  KN(s)  =  am(tN+,)/aN-  ^  follows  from  A2.5  that 
given  6  >  0  there  is  c(S)  >  0  and  N(5)  <  •  such  that  N  )  N(6)  and 
1 1 - s]  (  c(6)  imply  |K.N(t)  -  KN(s)|  «  6.  Define  K(t)  =  lim  KN(t).  Then  A2.5 
implies  K(t)  is  continuous  and  satisfies  0  <  K(t)  <  00  for  0  <  t  <  *  . 

Examples.  Let  an  =  1/n.  Then  m(tn  +  s)/n(exp  s)  •*  1  as  n  "  and 
Kn(s)  -  exp  -s.  Let  an  =  l/n?  y  €  (0,1).  Then  m(tn+s)/(n+sn^)  -  1 
as  n  -  ®  and  KN(s)  -  1.  If  an  -  c/log  n,  then  m(tn+s)/(n+s)  -  1  and  K.N(s)  -  1. 
In  general,  if  an  is  nonincreasing,  then  K(s)  <  1. 

The  H  and  L-Functionals  for  Non-Constant  (an).  We  next  define  the 
analog  of  the  H(x,a)  for  our  case  of  non-constant  (an).  Owing  to  the  fact  that 
an  is  not  constant,  the  H  and  L  functionals  will  depend  on  time,  if  K(t)  is  not 
equal  to  unity.  Define  the  ‘centered’  H-functional 
H0(x,oc)  =  H(x,cc)  -  <a,b(x)> 

and  set 

(2.7)  H(x,«,s)  -  K-1(s)H0(x,K(s)o)  +  <«,b(x)>. 


ism 


The  definition  (2.7)  and  (A2.2),  (A2.3)  imply  the  differentiability  of  H(x,-,s)  at 
a  -  0  with  b(x)  -  H^O  ,s)  (it  will  not  actually  depend  on  s).  Let  L(x,0,s) 
denote  the  dual  of  H(x,ot,s): 
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L(x,/3,s)  =  sup[<o,B>  -  H(x,a,s)]. 

a 

Using  (2.7),  we  see  that 

L(x,B,s)  =  sup[<a,B>  -  K-^sJHJx.KCsJa)  -  <«,b(x)>] 

a 

-  K"1(s)L0(x,K‘1(s)K(s)(6  -b(x))) 

=  K-1(s)L(x,6)  , 

where 

Lo(x,0)  =  sup  [<a,B>  -  H0(x,a)]  =  L(x,B  +  b(x)). 

(A2.3)  and  Lemma  2.1  then  imply  that  L  has  the  following  properties: 

(i)  L(  • ,  • ,  • )  *  0. 

(ii)  L(x,B,s)  =  0  iff  B  =  b(x). 

(iii)  L(x,8,s)  is  jointly  l.s.c.  in  (x,B). 

(iv)  L(x,B,s)  -  -  if  |B|  >  K. 

We  now  define  a  large  deviation  action  functional  for  (1.1).  Let  C[0,T] 
denote  the  space  of  Revalued  continuous  functions  on  [0,T].  Then  for  0  € 
C[0,T],  define  the  functional 

(2.8)  Sx(T,0)  -  |  L(0(s),0(s),s)ds 

if  0  is  absolutely  continuous  and  0(0)  -  x,  and  set  Sx(T,0)  =  •  otherwise. 

In  the  sequel,  all  functionals  of  the  type  (2.8)  are  assumed  to  take  the 
value  +«  if  0(  )  is  not  absolutely  continuous  or  0(0)  *  x. 


For  purposes  of  the  next  assumption  and  the  proof  in  Section  3,  it  is 
convenient  to  define  (for  each  N  and  x)  the  form  (X^,x,  n  >  N}  of  (1.1),  which 
star/s  at  time  N  with  initial  condition  at  time  N  satisfying  x£jx  =  x  and  then 

(’«)  X*»  -  X»'«  +  n  >  N. 

In  the  exogenous  case,  the  noise  in  (2.9)  is  the  same  as  in  (1.1),  while  in  the 
state  dependent  case  we  will  specify  €  M.  Define  the  continuous 
parameter  interpolations  of  the  processes  (1.1)  and  (2.9): 


(2.10) 

(2.11) 


X(t)  -  [<t-t„>XM1  +  <tn+rt>XJ/a„,  ,  6  lt„.t„+a„]  -  [t„.tn+1], 
X^t)  -  KMWIX*,;  ♦  IdH.i.lDXl'I/a,.  1  £ 


We  will  use  (A2.6)  below,  given  in  terms  of  the  processes  XN,x(  ).  The 
assumption  is  certainly  not  readily  verifiable,  but  it  allows  a  general  proof  of 
the  w.p.l  convergence  and  the  upper  bound  to  the  convergence  rate  given  in 
Section  3.  It  is  convenient  to  use  the  condition  as  it  is  stated,  since  it  is  the 
key  condition  in  Theorem  3.1,  and  in  different  cases,  different  sets  of 
conditions  would  have  to  replace  it.  In  Sections  4  and  5  we  devote 
considerable  attention  to  a  series  of  verifiable  conditions  for  (A2.6),  and  cover 
a  large  number  of  interesting  cases.  Let  Cx[0,T]  denote  the  set  of  continuous 
Revalued  functions  on  (0,T]  with  initial  value  x,  and  with  the  sup  norm 


topology. 


•  (.1  ij  4|  *af  *■>  “at  i't*i'l*i‘*^  ***^  *>* !*«  |*a  1*1. 
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A2.6.  Let  s>0,  6>0,  T>0  and  compact  F  C  A0  ( the  interior  of  A)  be  given. 
Then  there  is  NQ  <  ®  such  that  for  any  x  €  F,  any  set  A  €  Cx[0,T]  satis fying 
inf,^^  Sx(T,$)  )  s.  and  any  N  >  N0,  we  have 

(2.13a)  aN  logP{XN'x(-)  €  A|FN}  <  -  s  +  6 

(2.13b)  aN  log  P{XN'x(  )  €  A|lN  =  5}  <  -  s  +  6 

for  almost  all  u  and  all  5  e  M  in  the  cases  of  exogenous  noise  and  state 
dependent  noise,  respectively. 

Remark.  The  uniformity  of  the  estimates  with  respect  to  u  (respectively 
l)  imply  that  (2.13a)  (resp.  (2.13b))  continues  to  hold  if  we  replace  N  by  any 
stopping  time  M  *  NQ. 

Finally,  we  state  the  slowest  rate  at  which  we  can  allow  a  -  0. 

n 

A2.7.  For  every  6  >  0,  E  exp  -  6/a  <  •  Ea  =  ®. 

^  n  r  n  n 

For  example,  let  an  *=  cn/log  n,  and  cn  -  0  with  Ean  =  °°.  Then  (A2.7) 
holds. 
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3.  THE  BASIC  CONVERGENCE  THEOREM 

The  following  lemma  gives  several  important  properties  of  our  action 
functional. 

Lemma  3.1.  Assume  (A2.1)  to  (A2.3).  Then  for  any  T  >  0: 

(i)  S$,(o)(T,0)  is  l.s.c.  in  <t>  €  C[0,T]- 

(ii)  For  any  compact  set  F  C  Rr,  and  any  00  >  s  >  0,  the  set 

G  =  u  (ip:  S(T,0)  <  s} 

x€TF 

is  compact. 

(iii)  Sx(T,0)  *  0  iff  0  =  b(0)  (a.s.)  in  [0,T],  and  0(0)  =  x. 

( i v )  For  each  t  >  0  and  T  <  ®  there  is  a  6  >  0  such  that  |B  -  b(x)|  ?  £ 

implies  L(x,0,s)  )  S  on  [0,T]. 

Proof,  (i)  See  [14;  Theorem  3,  Section  9.1.4). 

(ii)  Recall  that  for  |0|  >  K  implies  L(x,0,s)  *  •  for  all  s  >  0.  It 
follows  that  0  €  G  implies  that  0  is  Lipschitz  continuous  with  constant  i 
K.  Ascoli’s  theorem  then  implies  that  G  is  precompact,  and  (ii)  now 
follows  from  (i). 

(iii)  Sx(T,0)  -  0  iff  L(0(s),0(s),s)  -  0  a.s.  in  [0,T],  Since  L(x,0,s)  =  0  iff 
6  -  b(x),  Sx(T,0)  =  0  iff  0  -  b(0)  a.s. 

(iv)  It  is  enough  to  work  with  L(x,0).  Let  xn  -  x,  0n  -»  0  such  that  |0n  - 

b(xn)|  >  £  >  0  and  L(xn,0n)  -  0.  By  the  l.s.c.  properties  of  L,  limnL(xn.0J  > 

L(x,0)  which  equals  zero  only  if  0  *  b(x).  □ 


We  now  present  the  convergence  theorem 


1 

Theorem  3.1.  Assume  (A2.3)  to  (A2  7).  and  that  gum  wm, 
neighborhood  G(0)  of  0  such  that  G(0)  C  A0  there  is  (a  s  i  a  I'and,  m  ,  i n 

s«c/i  /Aat  Xn  €  G(0). 

77ich  X  -»  0  w.p.l. 

n 

Assume  in  addition  that  given  c  >  0  there  m  N  <  * 
a i /a n  4  *  +  €  for  a H  i  >  N  *  N  .  Then 


(3.1 )  lim  aN  log  P{Xn  €  G(  0).  some  n>N  \  N  -  6  t  g ; 


< 


inf 

<M<f>(0)  -  0|  <  8 
$(t)£G(0),  lomt  «<* 


S4>(o)(l 


-S' 


Remarks.  If  not  all  paths  visit  some  neighborhood  ..,f  0  :nf . n : •  e .  . 

often  (i.o.)  then  we  will  have  Xn  -  0  w.p.l  with  respect  to  th  sc  path 

which  do.  It  is  expected  that  the  recurrence  condition  would  be  \er . Yc  ! 

by  a  weak  convergence  argument.  Under  the  last  assumption  .  t  the 
theorem,  K(t)  <  1  ,  which  implies  L(x,B,t)  >  L(x,B)  It  is  then  simple 

to  show  (see  the  arguments  below)  that  for  small  6  >  0  the  rhs  of  <  t  : 

is  strictly  negative.  In  particular,  if  an  is  nonincreasing,  then  kiti  ^  l 

Proof.  For  6  >  0,  let  N fe( 0)  denote  (x:  |0  -  x|  *  6)  We  will  first  prose 
that  if  (Xn)  visits  G(0)  infinitely  often  w.p.l,  then  ( X n)  visits  N 9 1  mfimtels 
often  w.p.l.  We  can  suppose  that  N:g(0)  C  G(0) 

Owing  to  the  stability  assumption  (A2.4),  there  is  T  <  -  such  that  if  t 
satisfies  4>  =  b($)  and  #0)  «=  x  €  G(0)  then  #t)  €  NgiJ(0)  for  t  *  T  j  Define 
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rr,  t  i  *.  sufficient  t  s  h  i  ’  vi  that 


(3.3)  lim  P{Xn+i  *  N8  (0),  some  i  <  -  |  Xn  €  Ns  (0)}  =  0. 

n  1  2 


We  have  the  obvious  inclusion 


{Xn+i  N5,(®).  for  some  *  <  ®  and  Xn  €  N6j>(0)} 


C  {Xj  f.  N&  (8)  for  some  m(jT2  +  tn)  <  i  <  m(jT2  +  T2  +  tn) 


and/or  N^C),  some  0«j<-,  and 


X.«N6(6)) 


v.-'W-.*  a  <eW'/  n  ,x» e 


It  follows  that 


P(x„+1  fN’sW.  some  i  <  -  |Xn  €  N8j(0)} 


(3.4) 


<  ^(E*«0V.n)|  "  <Xn  €  Ng^)}}  ■ 


Note  that  for  any  fixed  j  inclusion  in  the  conditioning  set  implies 
X  it  ,  ,  €  Nc  (8)  .  Thus  (3.3)  follows  from  (3.4),  (A2.6),  and  (A2.7). 

mtjT2+tn'  °1 


We  now  consider  (3.1).  Let  T  >  0  be  fixed.  Define  the  set  of  paths 


A(T)  -  (0: |^(0)  -  ©|  i  6,  #t)  £  G(B)  for  some  t  <  T,  and 


|#t)  -  01  >  6/2  for  T  <  t  <  T). 


We  claim  that  for  large  enough  T, 


inLs«(o)(T-w  9  s 


(3.5) 


1  l.i  l.l  !.*  M  I.*') 


,  *‘4.*  *.4 


First  note  that  the  same  proof  as  that  of  (3.2)  implies  there  is  c3  >  0  and 
T,  <  *  such  that  if  we  define  As  *  0)  €  G(6),  $<T3)  $  N 6 /2( ®))  >  ,hen 


inf  f  L($(s),$(s))ds  >  c, . 

*€a3  0 

Let  i  =  the  integer  part  of  (T-T)/Ts.  Then  for  the  paths  in  A(T)  that  do 
not  escape  from  G(8),  we  have 


S#0)(T,«  *  *cs  » 

which  implies  (3.5)  (when  T  is  large). 

Now  define  the  stopping  times  T*  by  =  N,  r^+1  = 

inf{n  )  m(T^  +  T):  Xn  €  N6(8)  or  Xn  %  G(0))  and  the  events 

=  (X  N  £  G(6)  or  t  N  -  t  N  >  T)  .  We  use  the  following  estimate, 
Ti+1  Ti+1  Ti 

which  is  derived  in  the  same  way  as  (3.4). 

P{Xn  *  G(6),  some  n  >  N|Xn  €  N6(9)} 

(3.6)  <  E  P{Ef  |  n  (Ef)c  n  {XN  €  NB(0))}  . 

j=o  ,<:j 

Fix  hj  >  0.  By  (A2.6)  and  (3.5),  an  upper  bound  to  the  r.h.s.  of  (3.6)  is 
given  by 


I  exp  -  (S  -  hjJ/aj 

i=N 

-  (exp  -  (S*  -  hs)/aN)  E  exp[-(S*  -  h^/a^  (S*  -  h?)/aN] 

i=N 

when  N  is  large.  Thus  (3.1)  follows  if  we  prove  that  given  h2  >  0  there 
is  hj  >  0,  N  <  ®,  and  M  <  ®  so  that  for  N  >  N, 


wwww 


w 


(3.7)  I  exp  [-(S’  -  h1)/ai  +  (S*  -  h2)/aN]  <  M. 

>=N 

To  pru^c  (7  7),  take  t  -  (h2/8S’)  A  ^  ,  and  hj  »  eS*/2.  Pick  N  large 
enough  so  that  a  /aN  <  1  +  «  for  i  >  N  >  N  .  Then  for  i  such  that 
a »/ a n  *  *  -  £  wc  have 

[-S’  +  h,  ♦  (S’  -  h2)a1/aN]/ai 

<  [hj  +  cS*  -  h2(l  -  €)]/aj  <  [  -  hj/41/a,  . 

On  the  other  hand,  if  ai(/aN  <  1  -  €  ,  we  obtain  the  following  bound  for 
the  exponent: 

[hj  -  €S'l/at  -  [  -  cS’/21/aj  . 


Hence  (3.7)  follows  from  (A2.7). 


□ 


4.  A  PROOF  OF  (A2.6)  FOR  EXOGENEOUS  NOISE  AND  SMOOTH 
DYNAMICS 


In  this  section,  we  prove  (A2.6)  under  more  readily  verifiable  conditions, 
and  in  the  exogeneous  noise  case.  In  Section  5,  we  state  and  discuss  other  sets 
of  conditions  under  which  a  similar  proof  yields  (A2.6). 

First,  we  show  that  the  H-functionals  as  defined  by  (2.7)  arc  the 
appropriate  ones  for  the  case  an  -  0  in  a  general  setting  Then,  in 
Theorem  4.2,  a  basic  sufficient  condition  for  (A2.6)  will  be  obtained 


Theorem  4,1.  Assume  (A2.1),  (A2.2)  and  (A2.5).  Then  for  the 
exogeneous  noise  case  and  uniformly  in  u  (w.p.  1.)  and  in  t  in  any  hounded 
interval, 

___  |  _  rroft^  +  t  +  A)  •<  _ 

(4.1a)  Jim  -Jim  aNJog  Er  exp  I  <o,a.b,(x,(i)>/aN  (  H(x,a,t). 

A-0  A  N  N  N  L  m(tN  +  t)  '  1  ' 

For  the  state  dependent  noise  case,  and  uniformly  in  (  and  in  t  in  any  hounded 
interval 


_  1  _  pm(t^j-f  t-t-A)  _ 

(4.1b)  lim  -  lim  aNlog  Ej  exp  Z  <a,a b(x,(*  ) >/aN  <  H(x.a,t 

A-0  A  N  N  L  m(tN  +  t)  '  '"N 

Assume  in  addition  (A2.3).  Then  for  any  0  <  Tj  <  T2  <  ",  Ho(x,0,t) 
equals  b(x)  where  b(x)  also  satisfies 


(4.2) 


1  m(VTj) 

b(x)  -  - — -  lim  Er  Z  a  b  (x,U, 

Tj  -  Tj  n—  '  n  m(*N+T j)  1  1 


with  an  analogous  statement  holding  for  the  state  dependent  noise  case. 


Proof.  We  only  prove  (4. la).  The  proof  of  (4.1b)  is  similar.  Also,  (4  2)  is 
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obvious  b\  (A2.5)  Fix  A  >  0  and  define  bN(x)  «  Lr  b'\.(  ■  Kou.it  ttu 

i  1  s 

left  side  of  (4.1a)  as 

_  I  _  rfr.(tN  +  t  +  A) 

lim  -  lim  awlog  Er  exp  I  <a,a  (b(x,(  )  -  bNixu  a. 

A  A  N  N  fN  L  m(tN  - 1)  '  1  NJ 

I  m(tN  +  t-t-A)  _ 

+  1  i  m  -  I  i  m  a  u  I  <a,  a  bN  (  x  i  ■  a 
A  A  N  N  m(tN+t)  '  '  ’ 

Under  (A2.2)  and  (A2.5),  the  last  term  in  (4.3)  equals  <«,b(x)>.  Thus,  wc  need 
onl>  work  with  the  first  term  in  (4.3).  It  will  be  proved  that  the  first  term  is 
bounded  above  b> 

(4.4)  K'1(t)H(J(x,K(t)a) 

which  will  yield  the  theorem  in  view  of  the  definition  H(x.cx»t)  = 

K'HtlHjU.KlOa)  +  <oc,b(x)>. 

By  differentiating  the  part  of  the  first  term  of  (4.3)  to  the  right  of  the  aN 
term  with  respect  to  a(,  we  see  that  it  is  convex  in  {a(},  non-negative  and  zero 
if  a(  =  0  Because  of  this,  the  definitions  of  the  KN(  )  (below  (A2.5))  imply 
that  there  arc  cN( A)  tending  to  zero  as  N  -  •  and  then  A  -  0  such  that  an 
upper  bound  to  the  first  term  of  (4.3)  is  obtained  by  replacing  the  af/aN  there 
by  an  upper  bound  (KN(t)  +  cN(A)),  and  by  replacing  the  left  hand  aN  by  an 
upper  bound  A(Kj^(t)  +  cN(A))/[m(tN+t+A)  -  m(tN+t)].  j 

Wc  next  make  use  of  the  following  fact.  Given  a  convex  function  H(a) 
such  that  H(0)  «  0  and  H(a)  >  0,  the  inequality  H(sa')  <  sH(a')  is  valid  for 
all  0  <  s  <  1,  and  for  all  a'.  Picking  s  -  s,/Sj  and  a>  m  SjCt,  wc  obtain  for  j 

i 

all  0  <  Sj  <  Sj  and  for  all  a  that 

! 

I 

l 

1 

t 

4 

4 

I 

I 
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(4.5)  s^HfSja)  <  s^HfSja)  . 


We  will  take  Sj  =  KN(t)  +  cN(A)  and  s2  =  K(t)  +  cN(A)  +  A. 

Since  KN(t)  i  K(t)  +  A  for  large  N,  an  upper  bound  to  the  first  term  in 
(4.3)  is  also  obtained  by  replacing  the  a;/aN  by  (K(t)  +  cN(A)  +  A)  and  the  aN  by 

A(K^(t)  +  cN(A))(KN(t)  +  cn(A)) 


(K(t)  +cn(A)  +  A)(m(tN  +  t  +  A)  -  m(tN+  t)) 

Doing  the  substitution,  using  the  definitions  of  b(x)  and  H(x,cr)  and  taking 
limits  yields  the  desired  bound  (4.4)  for  the  first  term  in  (4.3)  □ 

Remark.  We  have  used  the  fact  that  A2.5  implies  Kj^O)  is  bounded 
from  above  uniformly  in  N  for  N  large. 


In  Theorem  4.2,  we  prove  (A2.6)  under  condition  (A4.J)  below. 


Remark.  For  a  continuous  parameter  problem  in  [12],  Freidlin  uses  a 
continuous  parameter  analog  of  (A4.1),  with  Lipschitz  continuity  of  b(-,0  and 
continuity  and  a-diffcrentiability  of  H(-,  )  to  get  the  large  deviations 
inequalities.  He  uses  lim  rather  than  1  i  m  to  define  his  H-functional.  An 
examination  of  the  proof  in  [12]  shows  that  (uniform)  continuity  of  b(-,5)  is 
enough.  Also,  for  our  ‘upper  bounding’  needs  the  <  in  the  lim  of  (A2.1)  is 
enough.  It’s  not  actually  necessary  that  the  lim  exists. 

A4.1.  Un)  is  exogeneous.  The  random  vector  field  bn(x,0  is  deterministic  - 
and  so  we  write  it  as  b(x,0,  where  b(  ,0  is  continuous ,  uniformly  in  t  and 
|b(x,OI  <  K  <  • 


Theorem  5.1  extends  Theorem  4.2  to  the  ‘non-deterministic’  random  field 

case. 

Theorem  4.2.  (A2.6)  holds  under  (A2.I),  (A2.3),  (A2.5)  and  (A4.1). 

Remark.  The  proof  is  along  the  lines  of  Freidlin,  Theorem  2.1  of  [12], 
with  appropriate  modifications  for  our  use  of  1  i  m,  an  ■*  0  and  the 
uniformity  in  x  required  in  (A2.6).  We  will  use  the  results  in  Freidlin’s 
proof  whenever  possible  to  simplify  our  argument. 

Proof,  (i)  Since  b(x,0  is  continuous  in  x,  the  lim  defining  H  and  H  and 
b  are  taken  on  uniformly  in  x  in  any  compact  set,  and  the  lim  in  (4.1a)  also 
holds  uniformly  in  x  (and  also  in  u,  w.p.l).  This  uniformity  implies  the 
following.  Let  T  <  °°.  Let  F  C  Rr  be  compact  and  let  A  >  0.  Let  a(  )  and 
•K  )  be  functions  defined  on  [0,T]  that  are  constant  on  intervals  of  the  form 
[iA,iA+A),  and  let  <K  )  be  F-valued.  (Assume  w.l.o.g.  that  T  is  an  integral 
multiple  of  A.)  Then,  uniformly  in  4<  )  and  u, 

;  +T) 

(<*(tj  —  tN),aib(0(ti  —  Ijij)' 

N 

<  [  H(<Kt),«(t),t)dt. 

(ii)  For  fixed  x  and  the  above  defined  tK),  define  the  process  (X^N), 
analogously  to  the  definition  of  (X*,N)  by  Xj(|'N  ■  x  and 

(4.7)  XJ;N  *  XJ'N  +  anb(^tn  -  tN),  <n), 


lim 

N 


aNlog 


■-Y  exp 


■m( 


and  its  piecewise  linear  version  (analogous  to  the  definition  of  XX,N(  •)) 


(4.8)  X-(t)  =  [(t  -  (t„  -  tN))X^  +  ((tB+1  -  tN)  -  t)Xn^]/an 

for  t  €  [tn  -  tN,  tn+1  -  tN). 

The  process  X^,N(  )  plays  an  important  intermediary  role  in  getting  the 
desired  large  deviations  result,  since  it  is  relatively  easy  to  get  one  for 
X^,N(  ),  and  then  to  extend  it  by  suitable  choices  of  <K  )- 

Define  Xj^,A  =  (X^,N(iA),  i  =  1,  ....  T/A).  We  next  prove  a  large  deviations 
upper  bound  for  the  vector  X^,  which  will  be  uniform  in  x  (in  any  compact  set 
and  also  in  uj,  w.p.l).  Let  Fj  C  Rr  be  compact.  Let  a  €  Rr,  i  s  T/A  and  define  a(  ) 
by  (the  manipulations  at  this  point  are  similar  to  those  used  in  [12,  Lemma  3.1]) 

T/A 

a(s)  =  I  a-,  S  €  [kA  -  A,  kA). 
i=k 

Then  by  (4.6)  (where  the  1  i  m  is  uniform  in  x  €  Fr  w  (w.p.l.)  and  <K  •)) 

(4.9)  lim  aNIog  Ejr^exp  <ai,X^'N(iA)>/aNj 

-  rm(t^+T) 

«  lim  aNl°g  EF  exp  E  ^^sW^t-t,^),^)  +  x)>/aN 


aNIog  Ejr^exp  <ai,X^N(iA)>/a 

-  r" 

>(*N+T) 

lim  awlog  E-r  exp 

N  N  N  L 

L  ^®(tj~tN- 

N 

f  H(0(t),a(t),t)dt  + 

<x,¥«,> 

1 

T£4 

o  k  j=i+1 

oj(  i aJ  +  <x,  1 

hx'^  («,,  «t/a) 

For  [Sj,  i  <  T/A}  =  fi  €  (R1)1^,  define  ...,  B^)  to  be  the  Legendre 

transform  of  hX|^(ar  ...,  Oj.^). 


'L'f 
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Remark.  The  last  equality  in  (4.9)  is  not  really  correct,  since  H(t/i,a,t) 
may  differ  from  H(0,a,iA)  over  the  interval  [iA,iA  +  A],  However,  since  the 
Legendre  transform  of  H(^,ot,s)  is  K‘1(s)L(^,0),  neglecting  this  variation 
amounts  to  no  more  than  multiplying  lx,^(0)  by  a  scale  factor  which  tends  to 
one  as  A  tends  to  zero,  uniformly  in  all  the  other  variables  (x,< p,&).  Since  we 
are  subsequently  allowed  to  choose  A  >  0  as  small  as  desired,  we  can  safely 
ignore  the  time  variations  over  the  interval  [iA,iA  +  iA]  as  a  matter  of 
notational  convenience.  We  maintain  this  convention  in  later  proofs  as  well 
but  will  use  *  or  5  rather  than  =  or  (  to  indicate  that  we  are  ignoring 
such  a  scale  factor. 

Define  $^’A(s)  =  {0:  (  s).  Then  (4.9)  and  a  theorem  of  Gartner’s 

((16],  Lemma  1.1)  imply  that  for  any  6  >  0,  h  >  0  there  is  a  N0  <  °°  such  that 
N  )  NQ  implies  that  (for  x  €  Fr  < K  )  as  above) 

(4.10)  Pjyd^xjA  £f'A(s))  >  6}  <  exp  -  (s-h)/aN. 

Here  dt  is  the  Euclidean  metric  on  (Rr)T/A. 

In  the  proof  of  his  result,  Gartner  used  a  definition  of  (his)  H-functional 
(it  is  the  function  G  in  (1.1)  in  [16])  which  involved  a  lim  rather  than  a  1  i  m. 
But  the  proof  of  his  Lemma  1.1  is  valid  if  lim  is  used  or  any  upper  (l.s.c.) 
bound  to  the  lim  is  used,  if  that  upper  bound  is  used  to  compute  the  L- 
functional.  Also,  according  to  the  proof  in  [16],  the  inequality  (4.10)  is  valid 
uniformly  in  all  variables  in  which  the  inequality  (4.1a)  is  attained  uniformly 
as  N  -*  ®,  A  -*  0.  Hence  (4,10)  holds  for  a. a.  u,  all  x  €  Fr  and  </<•)  as  above. 

(iii)  From  this  point  on  the  details  are  essentially  the  same  as  for  the 
classical  case  [12,  Theorem  2.1]  (which  also  uses  Gartner’s  result  (4.10)  for  the 
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'classical'  case),  and  only  an  outline  will  be  given.  The  interested  reader 
should  refer  to  [12]  to  fill  in  the  gaps.  A  main  difference  is  that  we  must  be 
more  careful  about  the  uniformity  of  the  estimates  in  x  and  u.  The  argument 
can  be  divided  into  the  following  steps. 

(a)  From  the  definitions  of  lx,<^  and  L,  it  can  be  shown  [12,  Lemma  3.1,  p. 
137]  that 

fT  - 

1*%)  *  L(<Ks),  fi(s),s)ds 

where  we  define  I3(  ■ )  by  the  linear  interpolation 

0(s)  =  [(iA  +  A  -  s)fL  +  (s  -  iA)8j+1]/A,  for  s  €  [iA,  iA  +  A], 

(b)  Since  |b(x,l)|  <  K,  the  X^,N(  )  are  Lipschitz  continuous  with  constant 
K.  Since 

inf  L(x,8,t )  =  • 

|B|>k 

for  all  x,  the  paths  in  the  sets 

*x(s)  =  {&■  SX(T,4>)  <  s) 

and 

T 

♦f'A(s)  -  J  L(<KO,  «t).  t)dt  <  sj 

are  also  Lipschitz  continuous  with  constant  K. 

These  facts  imply  that  given  6  >  0,  there  are  AQ  >  0  and  6f  >  0  such  that 


for  A  <  Aq  and  all  x  (d  and  dr  resp.,  are  the  sup  norm  and  Euclidean 
distances) 
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(d(X^N,  $J’A(s))  >  6}  C  &f'A(s))  >6'}. 

This  and  (4.10)  imply  that  given  h  >  0  and  6  >  0  there  is  N'0  <  “  such  that  for 
N  )  N’0,  a. a.  u,  x  e  F(  and  ^  as  above  we  have 

(4.1 1)  Pj-^{d(X^,N,  $J'A(s))  >  6)  <  exp  -  (s  -  h)/aN  . 

(c)  As  a  consequence  of  the  l.s.c.  property  (as  a  function  of  M  ■),  4>(  ■))  of 
Jo  L(iWs),<j>(s),s)ds,  given  h  >  0,  there  is  8X  >  0  such  that  if  d (4>.<t>)  s  6j  and  x  = 
<W0)  €  Fj  and  Sx(T,<£)  )  s,  then  L((Ws),£(s),s)ds  >  s  -  h  [12;  p.  142], 

(d)  Since  b(-,()  is  continuous,  uniformly  in  L  given  h  >  0  and  6.  >  0  (as 
in  (c))  and  62  >  0,  there  is  a  6  >  0  and  6j  >  0  (and  «  6j)  such  that  <J>  6  Cx[0.T], 
d(<t>.4i )  *  6j  implies  that 

(4.12)  (d(XN'*,<*>)  <  6}  C  (d(X*'N,$)  <  62). 

In  [12],  Freidlin  uses  a  Lipschitz  condition  on  b(-,0  to  get  the  set  inclusion 
analogous  to  (4.12).  But  continuity  is  also  sufficient. 

(c)  We  now  combine  the  facts  in  (a)  *  (d).  Let  h  >  0  be  given  and  define 

A 

62,  6j ,  6  as  in  part  (d).  Set  6  =  min[62,6,61']/2.  Define  the  compact  set  R(x) 
=  (0  €  Cx[0,T].  0  is  Lipschitz  continuous  with  constant  K).  Let  i  i  M)  be 
a  6-net  of  R(0).  Then  {$*  =  x  +  i  <  M)  is  a  6-net  of  R(x).  Choose  A  >  0 
and  4>I  such  that  the  are  constant  on  the  intervals  [jA,  jA+A),  j  <  T/A,  and 

A 

sup  d($.,0.)  <  6/2.  Define  0*  *  x  +  (Jr.  For  x  ranging  oxer  Fr  the  i^(t), 
t  <  T  take  values  in  some  compact  set. 


By  ti.e  set  inclusion  in  (4.12)  and  the  definition  of  6,  we  have 


<  E  PF  (d(XN,x,  f)  <6)1 

!  FN  1  (d«^,  &x(.))>6) 

M  4?  ,N 

<  E  PF  {d(X  1  ,  %)  i  62}I  .  . 

!  N  .2  {d($*Sx(.))>B} 

If  d($,$x(s))  >  0,  then  Sx(T,$>)  *  s.  It  follows  from  part  (ci  that 

1{d(^t,*x(s))>5}  =  1  imPlies 
(4.14)  {d(X^‘N,  **)  S  8j } 

C  jx^'N€  [fc  |  L(*f(t),«(t),t)dt  >  s  -  h,  #0)  =  x 


Now,  by  part  (b)  there  is  NQ  <  00  (not  depending  on  x  e  Fj)  such  that  for 

<  exp  -  (s  -  2h)  aN  . 


N  )  N0  and  a. a.  u, 

(4.15)  Pr  {d(X^'N,  <^)  <  62}I 


{d(^,$x(.))>6} 

Combining  (4.13)  and  (4.15)  yields  that  there  is  N’0  <  •  (not  depending  on  x  e 
F  or  on  u  (w.p.l))  such  that 

(4.16)  Pjr  {d(XN,x,  ♦  (s))  >  6)  <  exp  -  (s  -  3h)/aN. 

N 


Now  suppose  that  we  are  given  ACC  [0,T]  satisfying  in£  S  (T,$>)  >  s  We 

$€a  x 

claim  that  d(A,d>x(s-h))  >  0.  If  not,  there  are  4>f  €  A,  $>tJ  €  4>x(s-h)  such  that 
d(^j1,4»f)  -•  0.  Since  4>x(s— h)  is  compact,  we  can  assume  that  -  4>  €  $x(s~h) 
Then  -  <p  implies  that  4>  €  A.  By  the  l.s.c.  of  Sx(T,  •),  we  have  Sx(T.$)  < 
lim  SJT.tf)  <  s  -  h,  a  contradiction.  It  follows  that  there  is  6  >  0  such  that 

i 

in£  S  (T,4>)  *  s  implies  that  d(A,  ♦  (s-h))  >  6  for  all  such  A.  Together  with 
$€a  x  x 

(4.16)  (with  s  replaced  by  s  -  h  there),  this  yields  the  existence  of  NQ  <  ®  (not 
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'•equcn.c  "f  /eo  mean  normalls  distributed  random  variables  vsith 
iriar.e  ~JI  and  »hi.h  a  1 1  e r  'he  paths  bv  a  vers  small  amount  with  3  vers 
P  r  b  a  b  i : :  r  .  t  .  r  small  Then  a  sequence  \  I  1  1  d  random  \  ariablcs  is 

du.et  '■  th3»  the  random  field  b^>  ♦  c,  can  be  represented  'in  the 
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sense  that  the  resulting  processes  have  the  same  measures)  as  F(xn,$n,vn), 
where  F  is  continuous  in  x.  Then,  a  proof  very  close  to  that  of  Theorem  4.2  is 
used.  The  details  for  a  scalar  case  (for  notational  simplicity)  where  b^x.O  =  b((x) 
are  given  in  Appendix  2.  They  are  an  adaptation  of  the  proof  of  the  vector  case 
large  deviations  upper  bound  given  in  [17]  for  the  constant  an  =  e  case.  An 
analogous  adaptation  for  the  general  vector  case  yields  Theorem  5.1. 
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5.2.  State  Dependent  Noise 


We  now  prove  the  form  of  Theorem  4.2  for  the  state  dependent  noise  case.  In 
order  to  provide  a  reasonably  general  proof,  we  generalize  slightly  the  definition 
of  the  state  dependent  {{*)  or  Markov  {tn.,,Xn}  processes.  Let  {xn}  be  a  sequence 
of  either  random  variables  and/or  constants  and  {(n)  a  random  sequence  with  the 
following  properties.  xn  is  a  f unction  (perhaps  deterministic)  of  (Xj,^,  j  <  n},  and 
PUn  €  •  I  Sn.!  =  xn_!  =  x,  ?n_.,  xn  i,  i  <  n)  »  Px((,).  Such  an  (xn)  sequence  is  said 
to  generate  {(n).  Most  often  xfi  =  Xn  or  X^',x  for  some  (to  be  stated  where 
necessary)  initial  condition  and  starting  time. 

We  use  the  following  condition,  where  H(x,ct)  was  defined  in  (2.2b).  The 
condition  is  satisfied  in  many  problems  of  practical  interest.  An  example 


will  be  given  at  the  end  of  the  subsection. 


A5.2.  (i)  {bj(  - ,  • ))  is  i.i.d.  and  |bj(x,OI  <  K. 

(ii)  Chen  7  >  0,  there  is  a  8  >  0  such  that  if  {x;}  generates  {^}  and  |x-x|  «  6 
for  all  i,  then  (E^  denotes  the  expectation  given  the  initial  condition  =  () 


-  ,  N 

(5  D  lim  —  log  exp  <o,  E  bj(x.,^)>  <  H(x,«)  +  7(|a|  +  1)  ■  H^x.ct), 


uniformly  in  {  €  M  and  in  x  in  any  compact  set. 


Theorem  5.2.  Under  (A5.2)  and  (A2.1).  (A2.3),  (A2.5),  condition  (A2.6) 
holds. 


Proof.  The  proof  will  be  set  up  so  that  it  can  be  completed  by  an 
argument  of  the  type  used  in  Sections  (c)  and  (e)  of  Theorem  4.2.  The  basic 


technique  is  adapted  from  [20],  where  a  genera!  treatment  of  the  upper  and 
lower  large  delations  bounds  are  obtained  for  the  constant  an  =  a  >  0  ease 

Fix  y  >  0,  and  let  6  be  defined  by  ( A 5  2 )  The  proof  of  Theorem  4  I 
adapted  to  condition  ( A 5.2 )  yields:  for  any  sequence  (xj  generating  U  ;  and 
satisfying  |x(  -  x|  <  6,  we  have  (uniformly  in  {  and  in  x  in  any  compact  set 
and  in  the  sequence  (x^), 

_ 1  _  r  m(tN  +  t  +  A) 

(5.2)  lim  -  lim  a*,  log  E  exp  I  (a,abli  [I  a,.-|  (  ,  .  =  i 

A  N  N  L  ....  "  'N  J 

i  Hyx.a,t ), 

w  here 

H7(x,a,t)  «  K_1(t)H0(x,K(t)a)  +  <a,b(x)>+  7K ''( t )(  K  ( t  )ioq  +  I). 

analogously  to  the  case  in  Section  4.  Define  to  be  the  dual  of  H-; 

By  (5.2)  and  the  theorem  of  Gartner  referred  to  in  Theorem  4  2.  if  (x  ) 
generates  {^)  and  |x(  -  x|  <  6,  then  for  Borcl  A, 

(5.3)  lim  aNlogP  Z  a  b(xt,^)  €  A  H  -  d  5  -  inf  ALy  x.6  A.t  1. 

N  t,  m(tN  +  t)  N  ) 

where  the  estimate  is  uniform  on  any  compact  (x.U  set.  as  well  as  in  the 
sequence  {x^. 

Henceforth  $(  )  is  some  function  in  Cx(0,T]  with  L.ipschitz  constant  <  K 
Recall  the  definition  of  {X*’x,  n  >  N)  from  (2  9)  The  sequence  generating  U  ) 
will  be  the  x^arguments  in  b^.l,)  in  the  functions  below  It  will  usuallv  be 
{X*'x,  n  >  N}.  Let  A  <  6/(7  +  K).  Define  DA<Xt)  «  #t  +  A)  -  fcti  Then,  it 
follows  from  (5.3)  that  (uniformly  in  each  compact  t.  x.  {  sen 


_  r  m(tN  +  t  +  A) 

(54)  lim.Nlo«P{|  a,b,(X^.t|)-Dl«t)|  <  274  |  -  t. 


IX“<V'I  -  «'»  <  *>} 


5  -  Ainf  AL-y($>(t),8  A.t). 

{6  ie-Dfl^oi  <  27A)  ' 


B>  [20.  Lemma  2  4],  (A5.2ii)  implies  that  for  given  8,  we  can  find  8' 
such  that  |6  -  8 1  [  <  7A  and 

(5.5)  L(<fxt),S'  A.t )  <  Ly<ftt),8  A.t)  +  y 

B>  using  this  in  (5  4)  wc  can  replace  the  right  side  by 


i  5  6 1 


.inf 

(8  |8-Da$uH  «  37A) 


AL(#t),8  A.t)  +  7 A 


(Since  7  can  be  made  as  small  as  desired,  the  added  7A  will  eventually  be 
dropped  ) 

Fix  T  <  -  to  be  an  integral  multiple  of  A.  Then  (uniformly  in  x.  ^  on 
each  compact  set)  (54)  and  (56)  yield 


(5' 1  1 1  m  aNlog  P(di\v,,$)  *  7A  |  »  {) 

N 

<  lun  aNlog  P(|X N  *(  iA)  -  #iA>|  (  7/1  1  (  T  i  |(N  •  () 

T  A  1 _ 

<  Z  lim  a  JogP{|DA(XN*(iA)  -  rA))|  <  27A  I 

ON'  1 

|X  N  *(  1 A 1  -  «.A>|  <  7  A,  {m(lN  +  lA,} 

T  A  1 

5  -A  I  .  ( tr f  L(<KiA).8  A.  1  A)  ♦  7T 

(8  |8-na^(iA)|  «  37A} 


I 
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Lct  $A(t)  =  #  iA)  for  t  6  [iA,  iA  +  A),  and  define  A^(0)  *  {0  €  Cx[0,T].  is 
constant  for  t  €  [iA,  iA  +  A),  and  |ii<iA)  -  D^&iA)  A|  <  37,  i  <  T  A).  Then  the 
r.h.s  of  (5.7)  may  be  replaced  by 

(5.8)  -  inf  [  L(^(t),  Wt),t)dt  +  7T. 

Since  Sx( T.0)  <  •  implies  that  0  satisfies  a  Lipschitz  condition  that  is 
independent  of  x,  there  is  A  >  0  (independent  of  and  x)  such  that  for  <> 
satisfying  SjT.t)  <  °°,  we  have 

sun  |tf>A(t)  -  <J><t )|  <  7, 

(5.9) 

Sup  sup  |<Wt)  -  &'t)|  (  4yT. 
lii€A|(<))  0<t<T 

I'sing  the  result  of  part  (c)  of  Theorem  4.2  we  can  now  complete  the 
estimate  just  as  we  did  in  part  (e)  of  that  theorem.  Let  s  >  0  and  h  >  0  be 
given  B\  picking  7  small  (which  implies  A  >  0  is  small)  we  have  that  x  €  F, 
Sx(T,4>i  i  s  implies 

(5.10)  -  inf  f  U4Ut),  <Wt),t)dt  t  -S(T»  +  h. 

4»€a'£(<>)  Jo 

Let  7  >  0  now  be  fixed,  and  set  £  ■»  min[6,A7]/2.  Define  R(x)  and 
choose  the  8-nct  of  R(x)  whose  cardinality  is  independent  of  x  €  F  as  in 
part  (c)  of  the  proof  of  Theorem  4.2.  Then  using  (5.7)  through  (5.10),  for 
large  enough  N  (and  independent  of  {  and  x)  we  have 


/  .*  /  7 


J  7  •  „• 


,  A  •- 


J 
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P^d;XN  r  *K(s))  >  6} 

<  5?  P^{d(XN,x,  **)  <  7A}I 


l  {d(**,*x(.))>6} 

i  ^  (cxp(-Sx(T,^)  +  2h  +  7T)/aN))I 


{d(^t,4>x(.))>6} 


<  M  cxp(-s  +  2h  +  7T)/aN. 


(A2.6)  follows  from  this  estimate  in  the  same  way  it  followed  from  (4.16)  in 
Theorem  4.2.  □ 


Example.  We  present  one  example  of  the  processes  that  may  be  handled 
by  the  methods  of  this  subsection.  The  example  includes  the  adaptive  routing 
process  mentioned  in  the  introduction.  For  additional  examples,  the  reader 
may  refer  to  [20]. 

The  basic  method  for  proving  that  an  assumption  such  as  (A5.2ii)  hclds  is 
to  first  show  (2.2b),  and  to  then  use  (assuming  it  exists)  the  continuity 
properties  of  the  measure  induced  by  b^x.l.)  as  a  function  of  x,  which  must 
be  uniform  when  conditioned  on  r  Conditions  under  which  (2.2b)  holds  are 
given  in  Example  4  of  Section  2.  For  more  details  or.  that  and  other 
examples,  the  reader  is  referred  to  [24], 

We  next  state  the  assumptions  of  the  example,  then  discuss  them,  and  then 
prove  (A5.2ii).  Assume  (i)  -  (iii)  below. 

(i)  There  are  functions  a4(x),  p;(x,0  >  0,  1  <  i  <  N,  with  a4(x)  continuous  in  x 

and  Pj(x,l)  continuous  in  x.  uniformly  in  (.  Furthermore  E  p.(x,()  *  1  and 

i 

each  pi  is  either  identically  zero  or  bounded  from  below  by  some  c  >  0.  Let 


a 


be  the  measure  of  the  random  field  bn(x,0- 


(ii)  The  process  {(*)  takes  values  1,  M,  where  M  does  not  depend  on  x.  Let 
Px.  and  Px'"  denote  the  1  and  n -step  transition  probabilities  of  {(*}. 

1  j  l  ,  j  n 

x,nn 

(iii)  There  is  nQ  ( not  depending  on  x)  such  that  P;.  is  continuous  in  x  and 
strictly  positive  in  x,  i,  j. 

The  third  assumption  implies  that  there  is  a  uniform  (in  x)  lower  bound 
on  the  geometric  rate  at  which  the  measures  induced  by  converge  to  the 
invariant  measure. 

Discussion.  In  many  applications,  the  distribution  of  the  random  field 
bn(x,0  is  concentrated  on  a  finite  number  of  points  which  move  continuously 
with  x,  and  which  do  not  depend  on  $.  Then  we  let  these  points  be  the  a^x),  i 
<  N,  and  then  p^x,?)  «  P(bn(x,0  *  a;(x)  |  x,{).  In  the  above  cited  ‘routing 
example’  bn  might  take  values  on  1  (if  bn  is  an  ‘indicator’  function)  or  values 
a2(x),  a2(x),  if  it  is  of  the  form  a^xJIj  +  a2(x)I2,  where  the  I.  are  indicators  of 
events  which  depend  on  the  arrivals,  departures,  routing  realizations,  etc.,  but 
whose  distributions  depend  on  the  current  values  of  x,  Both  forms  have 
been  used  in  the  literature. 

Proof  of  (A5.2ii).  By  (ii),  (iii)  above,  (2.2b)  holds  [24],  In  order  to 
simplify  the  notation,  we  set  nQ  «  I.  The  general  proof  is  very  similar.  Let  F 
be  a  fixed  compact  set.  Let  c  >  0  also  be  a  lower  bound  for  P*.,  x  €  F. 


Given  y  >  0,  choose  6  >  0  such  that  |x  -  y|  i  6  implies  that  la^x)  -  a k( y )|  5  y, 
Pj( y,U  <  p((x,Oexp  y  for  all  f,  and  P*  <  P*  exp  y  for  all  i,  j,  and  x  €  F. 

For  fixed  x,  let  {x(}  satisfy  the  hypothesis  of  ( A 5.2 i i ).  The  transition 
probability  used  to  get  the  in  bn(xn,tn)  will  be  P^n  Thus  (Xj)  (or 

whatever  sequence  replaces  it  below)  generates  {{J.  Then  for  i 

E^exp<<x,  £  b/x,,^)) 

l 

(*)  =  E^exp<a.  bi(xi.^i)>E^[cxp<a,bn(xn,{n)>  |  <n_rxn] 

-  Et[exp<o,  ^  exp<a,aj(xn)>pj(xn,k)PX^_i  J. 

Now,  for  all  xn,  J  j,  the  last  bracketed  term  is  bounded  above  by 

(**)  2  ^  exp<a,ai(x)  >p-(x,k)P|[  exp(|a|  +  2)7. 

j=i  k=i  1  1  vn_l,k 


Using  (**)  in  (*)  and  continuing  to  iterate  backwards  to  approximate  all  the  x; 
by  x  plus  an  ‘error’  yields  the  upper  bound  to  (*)  of 


E^exp^a,  £  bjfx,?*))  +  n(M  +  2)yj. 


(A5.2ii)  follows  from  this  and  the  convergence  in  (2.2b). 
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5.3.  The  Constrained  Algorithm 

Wc  now  outline  the  extension  of  Theorems  3.1  and  4.2  for  the  algorithm 
(1.2),  where  G  is  a  convex  set  which  is  the  closure  of  its  interior  and  its 
boundary  consists  of  a  finite  number  of  smooth  (C2)  sections.  References 
[18],  [19]  contain  a  detailed  discussion  of  large  deviations  bounds  (upper  and 
lower)  for  constrained  algorithms,  under  constant  gains  an  =  a  >  0. 
(Reference  [18]  is  an  abbreviated  form  of  [19].)  The  technique  used  there  is  an 
adaptation  of  the  method  of  Frcidlin  [12]  for  the  unconstrained  case.  In 
order  to  simplify  the  development,  we  will  use  the  assumptions  of  [18],  [19], 
and  discuss  the  main  questions  concerning  the  adaptation  of  the  proof  there 
to  the  present  case.  First,  we  define  the  mean  projected  dynamics  for 

X„  +  l  =  "G<Xr.  +  b(Xn’U>- 

The  processes  {X^'x},  {X^,N}  and  the  various  linear  interpolations  are  all 
defined  as  they  were  in  (2.10),  (2.11),  (4.7)  and  (4.8).  All  neighborhoods  and 
sets  used  below  arc  relative  to  G. 

For  x  €  G  and  v  €  Rr,  define  the  ‘projection’  of  v  at  x 

7Tr(x,v)  *  lim  [77r(x  +  Av)  -  x]/A. 

^  A-*o  u 

Define  the  set  of  outer  normals  to  9G  at  x: 

n(x)  -  [y:  for  all  y  €  G,  <7,  x-y>  >0,  |7[  -  1). 

Note  that  [26,  Lemma  4.6]  nG(x,v)  equals  v  if  x  €  G°  (the  interior  of  G)  or  x  e 
9G  and  suP7gn(x)^7<v  ^  <  0  (i.c.,  where  v  points  inward).  In  general,  it  equals 


v  -  <v,7*  >7*  if  x  €  3G,  sup7eii{x)<7,v>  >  0  and  7*  is  the  (a)  maximizer. 

Define  H(x,a,t)  and  L(x,8,t)  by  (2.7)  and  define  the  ‘constrained’ 
L-functional  by 

Lr(x,8,t)  =  inf  L(x,v,t). 

v:Hq  (x,v)  =  0 

For  x  0  G  or  if  the  infimizing  set  is  empty,  set  LG(x,8,t)  =  +«°.  Then  LG(x,8,t) 
=  L(x,8,t)  if  x  €  G°  or  if  x  6  9G  and  <7 ,8>  <  0  for  all  7  €  n(x)  (i.e.,  0  points 
to  the  interior  of  G).  If  x  €  9G  and  there  is  7  €  n(x)  such  that  <7,0>  >  0  (B 
points  ‘out’  of  G),  then  L  G(x,B,t)  =  ®.  The  interesting  case  is  when 
sup  <7,B>  =  0;  i.e.,  8  points  ‘along  the  boundary.’  In  this  case,  there  is  a 

7€n(x) 

true  (nontrivial)  minimization.  Since  L(x,8,t)  is  l.s.c.  in  8  and  L(x,3.t)  -  00  as 
|8|  -  ®  (under  the  assumptions  to  be  used),  the  infima  is  attained.  Define 

fT  - 

SG  (T,*)  *  LG(«s),0(s),s)ds, 

and  the  ODE  for  the  projected  mean  dynamics 
(5.11)  x  =  nG(x,b(x)), 

where  b(  )  is  defined  as  in  (A2.2). 

One  of  the  main  difficulties  as  well  as  points  of  interest  for  the 
constrained  algorithm  is  that  in  many  applications  the  escape  of  (Xn)  from  a 
neighborhood  of  a  stable  point  of  (5.11)  will  be  essentially  along  the 
boundary,  and  when  such  neighborhoods  are  entered  from  the  outside  it  is 
often  essentially  along  the  boundary  as  well. 

We  will  use  the  assumption 
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A5.3.  (i)  The  noise  Un]  is  exogenous. 


(ii)  b(x,0  is  hounded  (by  K)  and  is  Lipschit:  continuous  in  x,  uniformly  in  (. 

(iii)  Partition  b(x,0  into  two  parts ,  of  dimension  s  and  r  -  resp.  Define 


-  cov  t  [b(x,tN  .)  -  b(x)]  =  IN,n(x)  = 
n  j=i 


Then  either  (al  ( the  non-degenerate  case)  limfJ 


I*’"(x)  I^n(x)  ' 
l£n(x)  l£n(x)  _ 

rN,n(x)  >  0  (in  the  sense  of 


positive  definite  matrices)  or  (the  degenerate  case) 


(b)  lim  E^.,n(x)  =  lim  I^’n(x)  =  lim  E^,’n(x)  =  0  and  JiniE^'n(x)  >  0. 

Remark.  A5.3(iii)  is  not  particularly  restrictive  in  applications,  since 
many  algorithms  divide  naturally  into  components  which  are  not  directly 
affected  by  noise  and  those  which  are  in  a  ‘non-degenerate’  manner.  It  and 
A5.3(ii)  were  used  in  [18,19]  to  prove  the  l.s.c.  of  SG  x(T,0),  the  action 
functional  for  the  constant  gain  an  =  a  >  0  case  [19,  Theorem  2].  In  [18,19], 
we  required  the  set  U(x)  =  [B:  L(x,8)  <  *}  (or  its  analog  in  the  degenerate 
case  A 5.3( i i i b))  to  be  continuous  in  the  Hausdorff  topology,  but,  in  fact,  this 
follows  from  A5.3(ii).  The  analog  of  this  ‘Hausdorff  continuity’  condition 
for  our  case  will  always  hold  by  A5.3(ii)  wherever  it  is  needed  to  adapting  the 
proofs  in  [18,19]  to  our  an  •*  0  case. 


Theorem  5.3.  Let  (5.8)  have  a  unique  solution  for  each  initial  condition  in  G. 
Let  0  be  an  asymptotically  stable  point  of  (5.8)  with  domain  of  attraction  A  C  G,  and 
let  {Xn}  enter  (infinitely  often  w.p.l)  a  compact  set  D(8)  C  A  Assume  (A2.1). 
( A2.3),  ( A2.5),  (A2.7)  and  (A 5.3).  Then  Xn~  8  w.p.l. 


If  we  assume  m  addition  that  given  e  >  0  there  is  N  <  ®  such  the: 

a./aN  <  1  +  t  for  all  i  >  N  >  N  ,then 

(5.12)  lim  aNIog  P{Xn  )£  D(8),  some  n  *  N  |  |XN  -  8|  <  6} 

inf  Sc^t,W. 

<#>:)  <^>(0)  -  8|  <  6 
4>( t)  D(8)  for  tome  t<°° 


Remark.  Rate  of  convergence  results  for  the  constrained  algorithms  are  not 
available  via  the  classical  stochastic  approximation  method  of  ‘local 
linearization'.  This  makes  estimates  of  the  form  (5.12)  particular!;,  important. 


Remarks  on  the  Proof.  The  argument  closely  follows  the  lines  of  the 
argument  of  Sections  3  and  4.  In  [19,  Theorem  2],  for  the  ‘constant  gain' 
case,  the  I.s.c.  of  SGx(T,$>)  was  proved.  Purely  notational  changes  in  the  proof 
there  gives  the  I.s.c.  in  <p  of  SG  0(O)(T,^).  Since  rtG(x,v;  =  8  implies  M  »  |8;.  the 
compactness  of  (for  compact  F) 

&  l*  Vx(T-*)  «  » 

is  proved  as  it  was  in  Lemma  3. 1  (ii).  Note  also  that  SG  x(T,<>)  =  0  iff 
Lg(#s),4>(s),s)  =  0  a.s.  By  the  definition  of  LG  and  the  fact  that  L(x.8.s)  =  0 
iff  8  =  b(x),  LG(x,B,s)  =  0  iff  8  =  nG(x,b(x)).  Therefore  SGx(T,$>)  *  0  iff  <Ks) 
*  nG( 0(s),b(0(s))  a.s.  These  remarks  give  us  the  ‘constrained  case’  analog  of 
Lemma  3.1.  Then,  if  (A2.6)  held,  with  SG  x  replacing  Sx,  we  would  have  our 
convergence  theorem. 

Under  the  smoothness  condition  in  (A5.3ii),  the  proof  in  [18,19]  can  be 
adapted  to  get  the  necessary  form  of  (A2.6),  in  much  the  same  way  that 
Freidlin’s  proof  in  [12]  was  adapted  to  get  the  proof  of  Theorem  4.2. 


6  APPENDIX  1  ON  THE  DIFFERENTIABILITY  OK  HI*. a)  at  a  =  0  in  ( A2  }) 


When  the  s 1 3 1 c  process  {\  )  is  Marks  or  when  it  is  "nc  .  mp  nen: 
of  a  Markov  process  tsuch  as  our  'state-depen dent  noise"  pr-ccr  '•  !  X . .  t, _ 

satisfying  certain  ’uniformly  recurrent"  conditions,  it  is  possible  t  rr  ■  1 

the  differcntiabilit>  of  Hfx.ai  for  a  wide  class  of  such,  rr  cc-c  bv 
using  analytical  techniques  and  the  characterization  c.f  Hx.a  ac  the  I  g 
of  the  cigens  alue  of  largest  modulus  of  an  operatt  r  a<y  ciat-cd  t  the 

process  <3*  in  -2  4  .  Sec  { 2 4 J  for  detail1-  on  h-w  this  appr  n.h  r.-a  •  h-.  -  .  I 

in  a  genera!  setting  However.  for  many  of  the  processes  amrig  m  the 
study  of  stochastic  systems  the  assumptions  required  bv  this  appr  a .  *  d 
not  hold  As  a  very  common  example,  one  max  consider  the  A R M A 
mu-del  to  be  discussed  below 

In  this  section,  we  outline  a  method  for  prosing  the  differential- :  1 1 1  - 
of  Hi  x .  a  i  at  a  «  0  that  is  based  on  well  known  ’ley  el  2"  and  'Ic'd  V 
large  dcyiations  results  and  which  is  general  enough  to  coyer  mar .  o-  the 
non-Markov  processes  encountered  in  recursive  algorithms 

In  applications,  we  would  not  want  to  be  concerned  »nh  the  abstract  level 
results  in  this  section  But.  they  make  n  clear  that  the  a-d if f erent ia b :1  it 

assumption  is  not  restrictive  and  can  be  treated  in  many  different  wav1  t*e 
work  onlv  with  the  exogenous  noise  case  and  stationary  and  continuous 
deterministic  random  fields  for  simplicity  of  exposition  Define  the  sample 
occupation  measure  (over  the  Borel  sets  T  i 

ln'  r-'il  "  ^7  f  1  ■: t,[ «  l  >€f 


-f, : 


and  the  '  p 

a  .  c  r  ! 

pr  hah. in.  m c a ■  u r e s  .in 

Kr 

end"*  cd 

with  the  t  ;  !  g  •  ■  : 

wear  .  •  r. 

■■  crgencc 

The  I  .w  arc  in 

K 

Assume 

the  t  -  1 ! :  ■  *  i  r.  c  1  a  r  g  ■. 

dev  131,.  n- 

estimate 

'x  jv  fixed  th r..ughi  ut 

Ais  ; 

rn.  •-  .  ■ 

u  1  '  c  n, .>i  'll  gu!; .  i  turn 

"'u ; 

,  1  on  K 

■ 

in:  r  :h>;:  m-  ■  • :  '  >  € 

K  I  '  <  x  <.'•  .  orr.ru,  to'  s  *  ®.  for  Bo't  /  A  '1  M  anj  ,  i4,  h  ..  i  w  p  !  . 

K  f  * 

i  i  m  —  lug  P  r  '  l  .  €  4  ;■  <  -  inf  I  ( v  ■ 

N  S  •  71  " 

»£a 

s u '  '  .  ,tr  •  .  n  d .  ’  rr  I  •  i  >  Aft  •  arc  c-.nta.ncd  in  man.  p  ;  a .  v  '  ~ ; , 

A-jnic 

At-  I  T*u  •<  «  mtuM.'e  v  €  M  xj<(  h  ;h„:  I  i\  i  *  0 

I :  i  '.-11'.  *  >■  i  iMm  i  Ah  i  i  and  i  A  6  2  i  that  l  i  .u  converges  <  w  p  1  .  to  s"  i 

Uc  new  show  that  i  A6  I  i  and  iA62i  impiv  the  desired  a-diffcrcntiabilitv 
Then  an  example  will  be  given,  and  the  approach  discussed 

B  >  Varadhan's  theorem  on  the  as>  mptotic  evaluation  of  integrals  [ T  1  ]  and  the 
boundedness  and  continuity,  of  b(x.  ).  the  following  inequalitv  holds  (w  p  ]  i 


it,  1 


1 1  m  — 
N  N 


log  Lr  exp '  a.  ?  b(  )  i  sup  [  f  '  a.v  >v(dv  I  -  1  ( \  >1 

0  1  1  V  L  J  x  J 

5  H*( x.a) 


Qbviouslv  Hfx.ai  «  H’lx.a)  Since  both  functions  are  convex  and  H(x.O)  = 
H’ix.Oi  »  0.  H(x.a)  is  a-differentiable  at  a  ■  0  if  H*(x.a)  is 


Next  note  that 


(6  2) 


w  here 


H*(  x.a) 


sup  sup  [J<a,y  >v(dy  )  -  Ix(v )] 

6  {v€M  Jyv(dy)  =  B) 
sup  [  <a,B  >  -  L*(x,B)], 


L  *<  x.Bi  =  inf  lx(\ ) 

{v€M  J*yv(dy  )  =  8  } 

Since  H*ix.O)  =  0.  8*  is  3  subdif  ferential  of  H*(x.a)  at  a  =  0  if 

(6  3)  H*i  x.a)  -  <a.B*  >  >  0.  all  a 

But  (6  3)  holds  iff  L*(x.8*l  -  0  since  H*  is  the  Legendre  transform  of  L* 
Since  H*ix.  )  is  convex,  it  is  differentiable  at  a  <=  0  iff  the  set  of 
subdifferentials  at  a  =  0  contains  only  one  element.  By  (Aft  2).  B*  =  JyT^d.w 
is  the  unique  v  alue  of  B  for  which  L*(x,B)  »  0.  Thus  8*  -  J  y \"x( d y  l  is  the 
unique  subdif ferential.  and  the  a-dif fcrcntiability  is  proved  Note  that  B*  = 
b( x ),  as  defined  in  (A2.2). 

Discussion  We  have  phrased  our  requirement  in  terms  of  H(,x.ai  at  a  = 
0,  but  as  shown  above  this  is  obviously  equivalent  to  the  uniqueness  of  iht  8* 
satisfying  L(x,B*)  *  0  The  reason  for  our  choice  is  that  in  most  of  the  work 
on  large  deviations  for  dynamical  systems  [12],  as  well  as  the  work 
generalizing  Cramer’s  original  paper  (16),  (24),  (25],  the  differentiability  of 
H(x,a)  in  a  is  taken  as  a  fundamental  assumption  As  a  consequence,  this  was 
the  condition  that  was  typically  verified  for  a  given  noise  process  Sec  for 


example  Lemma  3  4  of  [24] 


We  illustrate  the  method  with  an  example. 


Suppose  that  b(x,  )  is  continuous,  and  that  Un)  is  a  stationary 


ARMA  process  with  representation 


(64)  A,  {  +  A,(  ,  +  +  A  .  (  .  «  Bni£  +  B.tp  ,  +  +  B 

v  O^n  l'n-1  dj'n-dj  O  n  1  n-1  dj  n-dj 

where  {i p }  is  a  sequence  of  zero  mean,  bounded,  i.i.d  random  variables  For 

simplicity,  wc  assume  both  4,  and  ip  take  values  in  Rr  It  is  also  assumed  that 

d. 

the  roots  ol  dett  AQ  +  A,s  +  +  Ad  s  )  lie  outside  of  the  closed  unit  disc 


Define  S  =  (RV  (the  space  of  infinite  sequences  with  values  in  Rri.  an.: 
consider  the  mapping  F  S  -  S  defined  by  (F(  )  denotes  the  jth  component) 


FdsJlj  =  b(x,Pj) 


where  (s  )  and  {p  }  arc  related  by 


A.p  +  +  A  ,  p  ,  -  Bns  +  -  +  B .  s  . 

Cr  n  dj^n-dj  On  dj  n-d^ 


We  can  metrize  S  in  such  a  way  that  F  is  continuous  (and  in  fact  uniformly 
continuous  on  a  subset  ACS  such  that  (<PJ  C  A  w  p  i)  It  is  then  relatively 
straightforward  to  show  that  (A6.I)  and  (A6.2)  follow  from  the  (so-called 
‘level  3’)  large  deviations  results  for  the  process  (ipj  that  are  given  in  (23), 
under  a  suitable  application  of  the  ‘contraction  principle’  (a  ‘continuous 
mapping’  technique)  [21;  Section  2).  We  omit  all  details  here,  since  they  would 
take  us  too  far  afield,  and  the  techniques  are  known  in  large  deviations 
thcor v 
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In  general,  if  a  given  process  {{}  can  be  represented  as  a  c>. nt.ru; a; 
transformation  of  a  simpler  process  {^}  for  »hi.h  the  appropriate  'ic.ei 
results  exist,  then  we  max  obtain  (A61)  and  ( A  6  2 )  \  1a  the  eniu.iMi 
principle*.  In  the  course  of  doing  so.  we  also  verify  AT  I.  w.ih  H  u-j 
there  replaced  by  H  (x.a)  of  (6.1  i 

Although  this  approach  may  seem  abstract,  it  in  fact  rather  cavils  yields 
the  a-differentiability  for  a  wide  variety  of  the  noise  processes  of  inters  :  m 
stochastic  systems  theory.  which  often  Jt<  ha\c  such  a  representation 


7  APPENDIX  2  PROOP  OP  TUI  OR  PM  5  1  FOR  b(*.U  =  b(x). 

S(  A I  AR  C  ASP 

V.  •„  j  );i[t  [he  pr  <miI  in  [1"|  lor  (he  constant  an  s  a  »  o  ^  a  '■  c  The 

let.!:'.  |  i>t  the  lull  I  hci-rcm  5  I  use  a  similar  adaptation  Del  me  G  i  a  >  = 

P!h  v  i  a;,  and  G  1  [u.l]  -  R  h\  G^'ivj  *  supia  G^ai  *  vj  let  (vj 

anil  (c he  mutuallv  independent  sequences  ol  random  variables,  each  iid. 
with  (he  unilormls  distributed  on  (0.1),  and  the  p  Gaussian  with  mean 
,’er  at;  I  sari.itKc  >  0  let  pnt  i  denote  the  densit.  ol  p  and  G  j 

:  >■ :  .  ii  lut,  r  "t  (,  i  ,  and  the  di^tributim  functi't;  •  .)  p  I  hen 


—  G  ,  i  a  i  *  p  j a  -  b  )d(  >  <  b  i 

da  *  J 

is  uniiorml,  positive  on  ea^h  bounded  u.a'  set 

l)ci  me  t  i  as  (i^'i  i  was  but  using  C » c,  Rt  )  let  I  t  denote  a  compact 

set  It  tollows  from  the  weak  continuity  in  (A5  I)  that  Gcl  (v  i  is  continuous  in 

i  x  .v  i  €  R  «  (0.1  i  and  that  given  A  €  (0.1  2).  G^1  (  \  i  is  uniform  I  y  continuous 
on  1  ,  »  [A,  I  -  A)  Define  F  ( x  >  -  G  *<  v  ).  P  (xi  «  Gn'  (v  i  Now 

I  1  J  n  x  n  O.n  Ox  n 

analogous  to  what  was  done  in  Section  4.  for  x  €  Rr  and  n  *  N.  del  me  the 

auxiltarv  processes  in  (7  ||  (here  also  )  is  piecewise  constant  on  (0. 1 )  with 


intervals  of  constancy  (iA.  iA  +  Ai  and  we  write  1 =  iM\n  -  tNh  n  >  Ni 


( 7  |  a  ) 

XN“,  -  xN-*  ♦ 

a  F  <XN*  ). 

n  n  n 

V  ^  x  « 

XN  «  X, 

(7  lb) 

xN*  -  xN-*  ♦ 

n  4  1  n 

a  F  (XN-*)  +  ac. 

n  n  n  n  n 

x^:  -  -  x. 

(7  1  c  > 

V  N.m  .  vN.x 

A  0, n*\  A  O.n 

♦  a  F„  <XN  *), 

n  O.n  n 

XaN  *  *• 

(7  Idl 

\  ^  —  \  ^,N 

On ♦  I  On 

♦  a  F  n  (t lrN). 

n  On  n 

X*N  -  x 

C  N 

.•V'V1'.  .  • 

«  *  ■  *  *  •  *  •  ‘  ’  i  ■  •  ■  * 

•  ‘  -x_  A-  v'  A.  v*  N  X-' 
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Owing  to  the  way  that  G"1  was  constructed,  the  distributions  of  the 
process  defined  by  (7.1a)  are  equal  to  those  of  the  (X^,x,  n  *  N}  process 
defined  in  (2.9).  (In  fact,  the  discussion  above  indicates  how  the  random 
fields  b  ( v  )  could  be  constructed.) 

Bv  the  definitions  of  Gx'(  )  and  Gg1^  ),  the  distributions  of  {X*x,  n  >  N) 
and  (\^x,  n  >  N)  are  the  same,  and  we  will  work  with  the  latter.  Note  that 
the  (p  }  no  longer  appears  in  {Xg,x.  n  >  N).  For  6  >  0  and  large  N, 
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where  Kj  is  an  upper  bound  for  sup  an/aN  ^or  *ar8e  N'. 

N  t  n  <m(t^,  +  T) 

The  equivalence  (in  distribution)  of  the  processes  defined  by  (7.1bi  and 
p ,1c)  and  (7.2)  essentially  allows  us  to  prove  the  theorem  by  using  a  large 
deviations  upper  bound  for  (7.1c)  --  which  is  ‘smoothed’,  since  Gj’^v)  is 
x-continuous  A  large  deviations  upper  bound  of  the  type  obtained  in  Section 
4  can  rcadilv  be  obtained  for  (7.1c),  via  the  intermediary  process  (7. Id)  (as  in 
Section  4)  Henceforth  x  is  confined  to  a  compact  set  ¥ y 


Next,  let  XN'X( 


Xp*(  )  and  x£,N(  )  denote  the  piecewise  linear 


interpolations  as  in  (2.11),  but  for  the  processes  defined  by  (7.1a,c,d).  Recall 
part  (d)  of  Theorem  4.2.  The  following  set  inclusion  (4.12)  was  the  key  part 
of  the  proof 


(4  12 


<d(XN  x,*>  <  6)  C  (d(X^N,$)  (  8,) 


Here,  we  work  with  the  set  inclusion  (7.3)  instead  (see  below  for  proof)  for 
appropriate  5  and  62. 

(7.3)  (d(XN,x,$)  <  6)  C  {d(X^N,$)  <  6,}  U  N, 

where  P{N)  <  exp  -  M0/aN  with  M0  -  ®  as  o  -  0  and  not  depending  on  <f>  or  x 
(For  the  general  {-dependent  bj(  we  use  the  conditional  probability,  as  in 
Theorem  4.2,  and  all  upper  bounds  are  uniform  in  w  w.p.l.) 

Define 

Ho(x,ot,t)  =  H(x,oc,t)  +  K(t)aJoJ/2 

and  the  associated  L  and  S  functionals  and  S„  .  Owing  to  the  added  p 
in  (7.1b),  Hc  is  the  proper  H-functional  for  XNx(  )  and  for  x£,x(  ).  It  is 
enough  to  work  with  the  inclusion  (7.3)  instead  of  (4.12)  as  in  Theorem  4.2. 
owing  to  the  inequality  (7.2)  and  the  equivalence  (in  distribution)  of  the 
processes  XN,X(  )  and  Xg,x(  ).  Now,  the  same  arguments  that  were  used  in 
Theorem  4.2  now  imply  Assumption  (A2.6),  but  with  Sx  replaced  by  S0x  By 
[17,  Lemma  1), 

Urn  inf_  S0x(T,*)  >  inf  SX(T,*). 

0-0  $€ A  A 

The  last  two  sentences  yield  the  theorem.  Thus,  only  the  set  inclusion  (7.3) 
needs  to  be  shown.  This  inclusion  is  proved  in  exactly  the  same  way  as  (2.6) 
in  [17]  is  proved,  with  aj  or  aN  replacing  c,  Xp,x  replacing  X^  and  x£'N 
replacing  X^,  and  we  omit  the  details.  □ 
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